DataKind-BLR / PrathamBooks-Sprint-2018

Code and documentation for the collaboration with PrathamBooks during Sprint' 2018
MIT License
4 stars 7 forks source link

Tags Seeding/Validation Approach #24

Open umeshprasadk opened 6 years ago

umeshprasadk commented 6 years ago

Hi , This is an idea that occurred to me while reading the stories. (based on anecdotes) While reading I found that Marathi translation has more tags and many of them very good, but Hindi translation has only 3 tags. So first step For stories with multiple translations, merge tags (english) from all translations .. These will form the candidate set of tags Next we can take page wise content and can use wordNet or conceptNet to find a confidence score of these tags to sentences/page. Top scoring tags can be suggested to end reader for validation while they are reading these stories on website. They can validate a tag by simple up/down method. And this can further validate.

So overall approach

  1. Seeding the tag from all translations
  2. Assigning scores book, tag ==> score book page, tag ==> score illustration, tag ==> score [ in future]
  3. Gamifying reader experience by presenting most relevant tags to him.

We can also assign emojis to tags (specially entities), on pattern of whatsup chat or FB post .. Emojis will also nicely capture the different emotions ..