Closed guotong1988 closed 4 years ago
Hey @guotong1988 , you'll want to first gather enough data for the types of entities (fruit, vegetable etc.) that you care about. You can use an off-the-shelf set of embeddings (ex. GloVe) to train because these are common tokens and the embeddings for entities in the same class will already be clustered since they all used large, generic datasets to learn embeddings from.
In the second example, where you have labels like "Chinese fruit", you'll want to treat this as a multiclass classification problem (ex. output is [0, 1, 1, 0] instead of being one unique class [0, 1, 0, 0]. However, you can just make more classes like "fruit", "chinese fruit" but your model is going to start confusing classes because there will be a lot of overlap. You can also create two separate models to predict "fruit" and then "chinese" from the set of keywords but this is assuming every prediction has both labels.
Hope that helps.
There exists a similar task that is named text classification.
But I want to find a kind of model that the inputs are keyword set. And the keyword set is not from a sentence.
For example:
Another example:
Thank you.