cofacts / rumors-api

GraphQL API server for clients like rumors-site and rumors-line-bot
https://api.cofacts.tw
MIT License
109 stars 26 forks source link

Flow tag import script #271

Closed MrOrz closed 2 years ago

MrOrz commented 2 years ago

Discussion: "若水 ground truth" in https://g0v.hackmd.io/@mrorz/cofacts-meeting-notes/%2FlYTN-n1xQyqlHB9oeaPvuQ

This PR implements migration script, importFlowAnnotation.js, that imports ground truth data from Flow annotators (https://github.com/cofacts/ground-truth/blob/main/20211204_14859.zip) to database by:

  1. Map 0~16 (Flow's annotation) to actual category ID used in database
  2. Create (or update) app user ID=category-reviewer and user ID=flow-annotator in users index
    • genCategoryReview is also updated so that it will look for previous reviewer feedback using the ID in users index
  3. add article category (using app RUMORS_AI and app user ID "flow-annotator") if the article category is not existed yet
    • If there exists a deleted matching article category, the article category status will become NORMAL and article category author will switch to flow-annotator -- this is the existing behavior of createArticleCategory.
  4. add positive feedback (using app RUMORS_AI and app user ID category-reviewer)
    • so that the article-category will be selected by script 2 if no further downvotes exist

This PR also does the following refactor:

Test run result on staging

Staging: 2021/12/05 snapshot

(...omitted...) [14855] 2ccctakfijfk8 : + 0 categories & 1 feedbacks [14856] 2c8p4fd57uz8u : + 0 categories & 1 feedbacks [14857] 2ox47ruk2mjzi : + 0 categories & 1 feedbacks [14858] 1o5rne6khke7g : + 0 categories & 1 feedbacks [14859] 1sg4crr6ym2qp : + 1 categories & 1 feedbacks Created article-categories: 2999 Created feedbacks: 15946

Result from 1 article with category inserted image

coveralls commented 2 years ago

Coverage Status

Coverage increased (+0.5%) to 86.847% when pulling 9c2f86c655fe1cad04ef9a7537d95b777b8a0bd6 on flow-tag-importer into 3bfbac6a62c588e740e6aac52d23ff901f6e58e5 on master.