DataKind-BLR / PrathamBooks-Sprint-2018

Code and documentation for the collaboration with PrathamBooks during Sprint' 2018
MIT License
4 stars 7 forks source link

Story Content Data Correction #10

Closed githubssn closed 5 years ago

githubssn commented 6 years ago
  1. The csv file has couple of new lines where the page content is getting split and needs to be corrected.
  2. There are only 2 records of stories created by kids while there are more stories created by kids on the storyweaver portal.
  3. Get data on image text
  4. The story category information should ideally be included in the merged file as it will useful for tagging. Also, in the raw data file , the same story id has multiple categories. Could you please check this?
rshenoy21 commented 6 years ago

@arnabbiswas1 , On the storyweaver portal, when you select specific stories, you can see the tags they are associated with displayed in the bottom. I guess not all stories have tags as Purvi had mentioned earlier. But wherever they are mentioned, it would be good to include them in the story_content.csv file so we can analyze them as well. I did not see this as a column in the notebook from Sneha.

Also, if you are looking for more hands on this issue, I can help. This file will be a input for the visualization task I have taken up. Thanks.

arnabbiswas1 commented 6 years ago

Hi Rajesh...Could you please review the following PR which addresses this issue as well: https://github.com/DataKind-BLR/PrathamBooks-Sprint-2018/pull/13

For tags, please feel free to explore. Would be happy to chat further on slack.

arnabbiswas1 commented 5 years ago

Change has been merged using PR https://github.com/DataKind-BLR/PrathamBooks-Sprint-2018/pull/13. Hence closing.