How should I handle variation in text length (words for each line in training file)? Is it okay to just train with these differences or should I perform any normalization tasks to the text lengths before?
I am working on classifying words to a text that better fits them (i.e. relate the word electronics to text that mention or are about this topic). I'm just training on trainMode 0 with the text as data and the name of the text source as the label. The length of each text variate in range from 1 to 700 words. (Median of 74 words and std of 96 words).
How should I handle variation in text length (words for each line in training file)? Is it okay to just train with these differences or should I perform any normalization tasks to the text lengths before?
I am working on classifying words to a text that better fits them (i.e. relate the word electronics to text that mention or are about this topic). I'm just training on trainMode 0 with the text as data and the name of the text source as the label. The length of each text variate in range from 1 to 700 words. (Median of 74 words and std of 96 words).