Closed slifty closed 4 years ago
Hey first thing, could you rebase this from master so it picks up the new database schema from #331? Having trouble running the scrapes otherwise.
…which is a bit of a reviewing 🤔 because ideally we wouldn't have to do a migration/rollback dance when changing branches.
Description
This PR adds logic to remove certain types of duplicate claims from appearing in the CNN portion of the national newsletter.
Specifically:
Makes sure that our JOIN against known speakers does not accidentally create duplicate claims in the event that there are duplicate known speakers in the table.
Adds an additional clause to ensure that selected claims that have duplicates only result in the FIRST copy of the claim in the time window.
The query for the newsletter is starting to get a bit ridiculous, and may need to be refactored in the near future.
Due Diligence Checklist
Steps to Test
yarn test
yarn newsletter:send-test --national
Deploy Notes
None
Related Issues
Related to #109 -- We might want to mark it as resolved, though this doesn't cover all possible types of duplicate.