deepset-ai / COVID-QA

API & Webapp to answer questions about COVID-19. Using NLP (Question Answering) and trusted data sources.
Apache License 2.0
344 stars 121 forks source link

Real-Time data scraping for countries #60

Open ivan-zidov opened 4 years ago

ivan-zidov commented 4 years ago

Hi,

I have been working on chatbot for croatian language. Here is little help for real time scraping.

image

import requests from bs4 import BeautifulSoup import numpy as np import pandas as pd

url = "https://www.worldometers.info/coronavirus/" headers = {'Accept': 'text/html'} response = requests.get(url, headers=headers)

print(response)

content = response.content soup = BeautifulSoup(content,"lxml")

elements = (np.array([[y.text for y in x.find_all("td")] for x in soup.find(id="main_table_countries_today").find_all("tr")])) elements = [x for x in elements if len(x)==9]

wordmeters = pd.DataFrame(elements) wordmeters.columns = ["Country,Other","Total Cases","New Cases","Total Deaths","New Deaths","Total Recovered","Active Cases","Serious, Critical","Tot Cases/1M pop"] wordmeters

Timoeller commented 4 years ago

Sorry for the late reply. Integrating this data for questions like "How many cases are in X?" is actually on our roadmap, but would require quite a lot of implementations:

  1. We need to identify if a question is asking for this type of structured information.
  2. What type is asked for, new cases, total cases/deaths etc.
  3. Finally we need to match the country description in your Dataframe with the country that was asked for. [4. Maybe handle spelling mistakes in either country or what was asked for.]

It would help to be able to query an API with this info. Do you have any updates on your integration or would like to implement the proposed steps in this repository?