google-research-datasets / MAVE

The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attribute extraction study.
Other
137 stars 22 forks source link

Multiple values for one attribute in one paragraph #4

Open ShengleiH opened 2 years ago

ShengleiH commented 2 years ago

Hi, I found there are multiple values for one attribute in one paragraph in this dataset. But in your paper, the model only "seeks the best answer span in the product context". Can this model extracts multiple spans in the product context for one attribute?

data with multiple spans

{
"id": "8198319301",
"category": "Coats & Jackets",
"paragraphs": [
{
"text": "HTOOHTOOH Women's Plus-size Casual Turn Down Collar Mid Length Jean Jacket",
"source": "title"
},
...
],
"attributes": [
{
"key": "Style",
"evidences": [
{
"value": "Casual",
"pid": 0,
"begin": 28,
"end": 34
},
{
"value": "Jean Jacket",
"pid": 0,
"begin": 63,
"end": 74
}
]
}
]
}
{
  "id": "B00002N7X0",
  "category": "Aprons",
  "paragraphs": [
    {
      "text": "McGuire Nicholas C9 4 Pocket Utility Bib Apron in Natural Cotton",
      "source": "title"
    },
    {
      "text": "Constructed of heavy duty but lightweight cotton and ideal for a variety of jobs. The large waist pockets help to store tools or brushes. Reinforced at stress points for added durability. 2 large waist pockets 1 medium bib pocket 1 small bib pocket Extra reinforcement at stress points Canvas loop neck & waist tie Cotton canvas.",
      "source": "description"
    },
    ...
  ],
  "attributes": [
    {
      "key": "Style",
      "evidences": [
        {
          "value": "Bib",
          "pid": 0,
          "begin": 37,
          "end": 40
        },
        {
          "value": "bib",
          "pid": 1,
          "begin": 219,
          "end": 222
        },
        {
          "value": "bib",
          "pid": 1,
          "begin": 238,
          "end": 241
        },
        {
          "value": "neck",
          "pid": 1,
          "begin": 298,
          "end": 302
        },
        ...
      ]
    }
  ]
}
liyang2019 commented 2 years ago

Yes, the model can extract multiple spans in the product context for one attribute. Apology for this sentence being imprecise. As can be seen in Fig. 4, actually there could be multiple spans for the attribute.

The spans are attribute values. Sometimes different spans are synonyms to each other, sometimes they have different meanings. For the later case, we regard the attribute as multi-valued attribute, for example, there could be multiple values ('red', 'black', ..) for the 'color' attribute for a certain product.