DataKind-BLR / PrathamBooks-Sprint-2018

Code and documentation for the collaboration with PrathamBooks during Sprint' 2018
MIT License
4 stars 7 forks source link

For a story, merge content from multiple pages #2

Closed arnabbiswas1 closed 6 years ago

arnabbiswas1 commented 6 years ago

For stories_pages.csv, each story consists of multiple pages (There are multiple rows of page_content for one story_id). The content of a story spread across multiple pages need to be merged against one story id. Finally each row (instance) in the csv file should represent the content of one story.

This issue is dependent on the following issue regarding html clean up:

https://github.com/DataKind-BLR/PrathamBooks-Sprint-2018/issues/1

End result should be:

  1. Modified stories_pages.csv (Do NOT commit data in github. You may mention data file name in .gitignore so that it does not get committed to github)
  2. Script which is used to generate the data (MUST be committed to github)
arnabbiswas1 commented 6 years ago

@siddjain24 Since you co-own #1 please check this one as well.

arnabbiswas1 commented 6 years ago

I think I have the script ready for this. Will submit a PR soon.

arnabbiswas1 commented 6 years ago

This was address as a part of #1 and #10. Hence closing.