Open Oliph opened 1 year ago
One thing we could do is add a field called day_count
that acts as a record of how many days after the claim we have already searched. We could also just change the search_twitter_key
to function as so and change the initial check in step 1 above to check if the value of search_twitter_key
<= x days. Then, we have a while loop that says while day_count <= x && date + day_count <= today, perform a search on date + day_count and then increment day_count by 1.
To download data from Twitter and MyNews, the pipeline works as follow:
search_twitter_key
(or search_mynews_key)date
key and if date < today - days_after. It means, as the data collection need to be done x days before the fact-check date and x days after the fact-check, it has to wait long enough to be able to collect the x days after the fact check.The advantage is that the pipeline will be able to rerun after a crash without missing some days as the two conditions (not search_key present and the date + days_after < today) can be rechecked at any time. The problem with that approach is all the data collected is done in the past. While not problematic for Twitter. It may create some issues in Mynews (See #47). Ideally, while retaining the advantage of the current methods, it should be able to perform data collection as soon as a claim is recorded in the keywords collection (or maybe a day after) and continue the data collection until days_after date is reached.