DataKind-BLR / PrathamBooks-Sprint-2018

Code and documentation for the collaboration with PrathamBooks during Sprint' 2018
MIT License
4 stars 7 forks source link

Build a pipeline to process the text data (stories) #26

Closed arnabbiswas1 closed 4 years ago

arnabbiswas1 commented 6 years ago

Build a pipeline which will be preprocess the text data.

arnabbiswas1 commented 6 years ago

@TheDataAreClean I have created a pre-processing script (Notebook) in python : https://github.com/DataKind-BLR/PrathamBooks-Sprint-2018/pull/30. The data is available in the file : stories_pre_processed_content_english.csv. In case you are okay, we can close this issue. However, if your code is ready, please submit a PR. No harm if we have R and Python version of the same code (In fact that will be helpful in case some one wants to play with the pre-processing step).