Closed amritbhanu closed 8 years ago
This code will do the job.
re.sub(r'\s+', ' ', "abc xyz lmn")+"\n")
abc xyz lmn
Code updated. Use emailParserX.py to get dataset.txt and dataConversionF.py to convert to word vectors.
The 'L' letter in wordVectors.txt means long int data type, which is produced by the scikit-learn tokenizer
@imaginationsuper Can you please remove the extra whitespaces (just keep 1 whitespace) from the dataset.txt file? So that it is easier to extract by our code.