I need a model for text matching IR (QA). I just have short sentences but less data (about 20 000 examples).
Up to now I tried a deep CNN siamese model, but I face problem with overfitting due to amount of data.
I think it is very important how the data looks like for a model choice.
My data is structured in intents and entities. So I have same intent (same sentence) with several 100s different entities. So it is important that the model learns to recognize the entities (!) and the context where the entity appears (intent).
So different sentences of the SAME group and same entity need to match.
I think this is more specific than the Quora Duplicate case.
I need a model for text matching IR (QA). I just have short sentences but less data (about 20 000 examples). Up to now I tried a deep CNN siamese model, but I face problem with overfitting due to amount of data.
I think it is very important how the data looks like for a model choice. My data is structured in intents and entities. So I have same intent (same sentence) with several 100s different entities. So it is important that the model learns to recognize the entities (!) and the context where the entity appears (intent). So different sentences of the SAME group and same entity need to match.
I think this is more specific than the Quora Duplicate case.
Any tipps and tricks? :-)