BigCokee / G14_Encyclopedia

UoE/DS4D/G14_Encyclopedia
2 stars 4 forks source link

Fineness of data processing #10

Open dingdinger1008 opened 2 years ago

dingdinger1008 commented 2 years ago

The initial text clean-up sample shows that after the removal of punctuation and numbers in the text clean-up step there is still some meaningless content, does it need further cleaning?

BigCokee commented 2 years ago

Yes, further cleaning is needed. I removed all the non-English letters in order to achieve the accuracy of the text analysis, and this step will produce some continuous spaces, and a space should be used to replace these spaces. And delete the stop words and restore parts of speech, so as to ensure the accuracy of the subsequent NLP analysis.