deepset-ai / COVID-QA

API & Webapp to answer questions about COVID-19. Using NLP (Question Answering) and trusted data sources.
Apache License 2.0
344 stars 121 forks source link

CDC Pregnancy page no longer in QA format. #99

Open mfleming99 opened 4 years ago

mfleming99 commented 4 years ago

https://github.com/deepset-ai/COVID-QA/blob/master/datasources/scrapers/CDC_Pregnancy_scraper.py

The CDC changed this page from a QA style page to a factual page on 7 April 2020. This scraper no longer produces any data when run.

Timoeller commented 4 years ago

Hey @mfleming99 I worked on the CDC general scraper that know has many more QA pairs and moved the CDC_Pregnancy_scraper.py to datasources/outdated in #101

Would you be interested in updating the pregnancy scraper yourself so we can add this data to our backend?

mfleming99 commented 4 years ago

Hi @Timoeller I would update the pregnancy scraper, but the CDC pregnancy page is no longer has question answer pairs to scrape. The page was converted into an informative page.

Timoeller commented 4 years ago

Ok understood. If you think it is valuable information we should still add it to our service.

If you update the scraper, please also update the manual check for a "?" at the end of each question/statement in our META_scraper.py?