covid19-dq-monitor / covid19-dq-dpc

3 stars 0 forks source link

Scraper Dashboard Tableau Regione Lombardia #2

Open opencovid-mr opened 3 years ago

opencovid-mr commented 3 years ago

La Dasbhboard Tableau di Regione Lombardia contiene informazioni tra le altre cose su Casi Positivi, Dimessi/Guariti, Casi Attualmente Positivi, Decessi e Ricoverati (Ordinari e TI):

https://public.tableau.com/profile/ariabi1179#!/vizhome/Dashboard_covid_produzione/DashboardCovid-19

Di seguito alcune righe di Python (ref. StackOverFlow) come spunto per impostare uno scraper della Dashboard Tableau di Regione Lombardia:

import requests
from bs4 import BeautifulSoup
import json
import re

url = "https://public.tableau.com/views/Dashboard_covid_produzione/DashboardCovid-19?%3Aembed=y&%3AshowVizHome=no&%3Adisplay_count=y&%3Adisplay_static_image=y&%3AbootstrapWhenNotified=true&%3Alanguage=it&:embed=y&:showVizHome=n&:apiID=host0"

r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

print(tableauData["vizql_root"])
print(tableauData["sessionid"])
print(tableauData["sheetId"])

dataUrl = f'https://public.tableau.com{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'

r = requests.post(dataUrl, data= {"sheet_id": tableauData["sheetId"],})

dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))

print(data["secondaryInfo"]["presModelMap"]["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"]["dataSegments"]["0"]["dataColumns"])