epfml / sent2vec

General purpose unsupervised sentence representations
Other
1.19k stars 256 forks source link

How each sentence form should be? #24

Closed bayou3 closed 6 years ago

bayou3 commented 6 years ago

Hello, I am confused for how each sentence form should be. For example, a sentence is "It is important for machine learning." After tokenizing and lowercasing, it is a form of list [ it, is, important, for, machine learning ], then I want to write this result into a file, one line one sentence, should it be "it, is, important, for, machine learning", which I mean should each word separate by comma?

mpagli commented 6 years ago

The tokens should be separated by spaces, if you want to form phrases such as 'machine learning' you can join them using some character such as '_': it is important for machine_learning

bayou3 commented 6 years ago

Thanks, it really helps me.