raysalem commented 4 years ago

Website data is below. note this a maitrix, need sum all three columns and to be bias towards positive also sum presumptive-->

	San Diego County1	Federal Quarantine2	Non-San Diego County Residents3
Positive (confirmed cases)	0	2	0
Presumptive Positive	8	1	0
Pending Results	38	6	4
Negative	99	11	8
Total Tested	145	20	12

URL https://www.sandiegocounty.gov/content/sdc/hhsa/programs/phs/community_epidemiology/dc/2019-nCoV/status.html

**Scraper code -->

{
    county: 'San Diego County',
    state: 'CA',
    country: 'USA',
    url: 'https://www.sandiegocounty.gov/content/sdc/hhsa/programs/phs/community_epidemiology/dc/2019-nCoV/status.html',
    scraper: async function() {
      let $ = await fetch.page(this.url);

      let cases = parse.number($('td:contains("Positive (confirmed cases)")').next('td').text()) + parse.number($('td:contains("Presumptive Positive")').next('td').text());
      return {
        cases: cases,
        tested: parse.number($('td:contains("Total Tested")').next('td').text())
      };
    }

I would fix this,b tut dont know Java Scriping

lazd commented 4 years ago

Thanks for the report, fixed!

And hey, this is a great excuse to learn JavaScript.

raysalem commented 4 years ago

the data is still wrong for san diego, and we might want something like this

what is the page https://www.sandiegocounty.gov/content/sdc/hhsa/programs/phs/community_epidemiology/dc/2019-nCoV/status.html -->

Positive Cases in San Diego County Since February 14, 2020Coronavirus Disease 2019 (COVID-19)Updated March 17, 2020

COVID-19 Case Summary | San Diego County Residents | Federal Quarantine | Non-San Diego County Residents | Total Total Positives | 51 | 5 | 4 | 60 Age Groups | | | | 0-17 years | 0 | 0 | 0 | 0 18-64 years | 43 | 1 | 3 | 47 65+ years | 8 | 4 | 1 | 13 Age Unknown | 0 | 0 | 0 | 0 Gender | | | | Female | 17 | 2 | 2 | 21 Male | 34 | 3 | 2 | 39 Unknown | 0 | 0 | 0 | 0 Hospitalized | 8 | 1 | 1 | 10 Deaths | 0 | 0 | 0 | 0

right now reporting zeros, since the data has changed python solution is -->

import pandas as pd import re import requests from bs4 import BeautifulSoup

URL = 'https://www.sandiegocounty.gov/content/sdc/hhsa/programs/phs/community_epidemiology/dc/2019-nCoV/status.html' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find("div",{"class":"table parbase section"}) rows = table.find_all('tr')

handle header

header = [row.text for row in rows[1].find_all('td')] header = [re.sub('[ \t\n]+', ' ',h) for h in header]

tbl ={} for row in rows[2:]: #skip the first row data = [r.text for r in row.find_all('td')]
if data[1] =='\xa0':continue
tbl[data[0]]=[int(d) for d in data[1:]] df = pd.DataFrame(tbl, index=header[1:]) display(HTML(df.to_html())) updateDateTime = rows[0].find('td').text.split('\n')[-1].replace("Updated","") print("updateDateTime %s" %updateDateTime )

will generate this -->

	0-17 years	18-64 years	Age Unknown	Female	Unknown
51	43	8	17	34	8
5	1	4	2	3	1
4	3	1	2	2	1
60	47	13	21	39	10

updateDateTime = March 17, 2020

covidatlas / coronadatascraper

San diego data is wrong #17

Positive Cases in San Diego County Since February 14, 2020Coronavirus Disease 2019 (COVID-19)Updated March 17, 2020

handle header