data preprocessing - Githubissues

PrincetonML / SIF_mini_demo

minimal example for sentence embedding by Smooth Inverse Frequency weighting scheme

MIT License

35 stars 11 forks source link

data preprocessing #2

Open hanhanzhai opened 6 years ago

hanhanzhai commented 6 years ago

I'm just wondering what type of data preprocessing for SIF embedding I need to do for the sentences. For example, 1) do I need to remove punctuations? In the example, sentences don't have punctuations. 2) should I tokenize negations? 3) what other preprocessing needs to be done? Thanks a lot!!