facebookresearch / StarSpace

Learning embeddings for classification, retrieval and ranking.
MIT License
3.94k stars 531 forks source link

Training on texts with different lengths #292

Open Magdiel3 opened 4 years ago

Magdiel3 commented 4 years ago

How should I handle variation in text length (words for each line in training file)? Is it okay to just train with these differences or should I perform any normalization tasks to the text lengths before?

I am working on classifying words to a text that better fits them (i.e. relate the word electronics to text that mention or are about this topic). I'm just training on trainMode 0 with the text as data and the name of the text source as the label. The length of each text variate in range from 1 to 700 words. (Median of 74 words and std of 96 words).