Closed raysalem closed 4 years ago
Thanks for the report, fixed!
And hey, this is a great excuse to learn JavaScript.
the data is still wrong for san diego, and we might want something like this
what is the page https://www.sandiegocounty.gov/content/sdc/hhsa/programs/phs/community_epidemiology/dc/2019-nCoV/status.html -->
COVID-19 Case Summary | San Diego County Residents | Federal Quarantine | Non-San Diego County Residents | Total Total Positives | 51 | 5 | 4 | 60 Age Groups | | | | 0-17 years | 0 | 0 | 0 | 0 18-64 years | 43 | 1 | 3 | 47 65+ years | 8 | 4 | 1 | 13 Age Unknown | 0 | 0 | 0 | 0 Gender | | | | Female | 17 | 2 | 2 | 21 Male | 34 | 3 | 2 | 39 Unknown | 0 | 0 | 0 | 0 Hospitalized | 8 | 1 | 1 | 10 Deaths | 0 | 0 | 0 | 0
right now reporting zeros, since the data has changed python solution is -->
import pandas as pd import re import requests from bs4 import BeautifulSoup
URL = 'https://www.sandiegocounty.gov/content/sdc/hhsa/programs/phs/community_epidemiology/dc/2019-nCoV/status.html' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.parser')
table = soup.find("div",{"class":"table parbase section"}) rows = table.find_all('tr')
header = [row.text for row in rows[1].find_all('td')] header = [re.sub('[ \t\n]+', ' ',h) for h in header]
tbl ={}
for row in rows[2:]: #skip the first row
data = [r.text for r in row.find_all('td')]
if data[1] =='\xa0':continue
tbl[data[0]]=[int(d) for d in data[1:]]
df = pd.DataFrame(tbl, index=header[1:])
display(HTML(df.to_html()))
updateDateTime = rows[0].find('td').text.split('\n')[-1].replace("Updated","")
print("updateDateTime %s" %updateDateTime )
will generate this -->
Total Positives | 0-17 years | 18-64 years | 65+ years | Age Unknown | Female | Male | Unknown | Hospitalized | Deaths | |
---|---|---|---|---|---|---|---|---|---|---|
51 | 0 | 43 | 8 | 0 | 17 | 34 | 0 | 8 | 0 | |
5 | 0 | 1 | 4 | 0 | 2 | 3 | 0 | 1 | 0 | |
4 | 0 | 3 | 1 | 0 | 2 | 2 | 0 | 1 | 0 | |
60 | 0 | 47 | 13 | 0 | 21 | 39 | 0 | 10 | 0 |
updateDateTime = March 17, 2020
Website data is below. note this a maitrix, need sum all three columns and to be bias towards positive also sum presumptive-->
URL https://www.sandiegocounty.gov/content/sdc/hhsa/programs/phs/community_epidemiology/dc/2019-nCoV/status.html
**Scraper code -->
I would fix this,b tut dont know Java Scriping