bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
126 stars 20 forks source link

Selection producing log errors of "tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8)" #8

Closed rexdouglass closed 3 years ago

rexdouglass commented 3 years ago

New Jersey's Covid Dashboard is available here https://dashboards.doh.nj.gov/views/DailyConfirmedCaseSummary7_22_2020/PCRandAntigenPositives?%3AisGuestRedirectFromVizportal=y&%3Aembed=y

Its system is bizarre, you can get county-day counts but you have to click on the date in the time series bar graph, and it will subset the county breakdown to that day. So we click each day on the time series in the middle, and read off the counts on the county breakdown on the left.

Using ws.select on the appropriate selectable with a valid value produces a series of logged errors

2021-03-31 20:54:15,322 - tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8) 2021-03-31 20:54:15,322 - tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8) 2021-03-31 20:54:15,322 - tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8) ...

I believe the error is thrown at this location inside the try https://github.com/bertrandmartel/tableau-scraping/blob/b6ec67123089a467f6043477ec747adfeb72a565/tableauscraper/TableauWorksheet.py#L226

A reproducible example is here

import pandas as pd  
from tableauscraper import TableauScraper as TS
url = "https://dashboards.doh.nj.gov/views/DailyConfirmedCaseSummary7_22_2020/PCRandAntigenPositives?%3AshowAppBanner=false&%3Adisplay_count=n&%3AshowVizHome=n&%3Aorigin=viz_share_link&%3AisGuestRedirectFromVizportal=y&%3Aembed=y"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

ws = ts.getWorksheet("EPI CURVE") #this one is weird, we have to select the date, and then grab the counties list
selections = ws.getSelectableItems()
print(selections)
dates=ws.getSelectableItems()[3]['values']
date=dates[0]
key=ws.getSelectableItems()[3]['column']
wb = ws.select(key, date) #throws errors "tableauScraper - ERROR - Expecting value: line 5 column 1 (char 8)"
rexdouglass commented 3 years ago

Thank you for asking for clarification. As insane as it sounds I believe I need to click the vertical time series bars. When you do that, it subsets the left hand side to the values for each county up to that day. By going back and forth, you can get the cumulative counts for every county for the whole time series. image

bertrandmartel commented 3 years ago

@rexdouglass I forgot to change the format to multipart instead of form url encoded like in filter and setParameter api. It seems some Tableau server accepts form urlencoded and multipart, but some accept only multipart like this one.

Using the v0.1.4, the following code lists all dates and gets the county worksheet for each one:

from tableauscraper import TableauScraper as TS

url = 'https://dashboards.doh.nj.gov/views/DailyConfirmedCaseSummary7_22_2020/PCRandAntigenPositives'
ts = TS()
ts.loads(url)

ws = ts.getWorksheet("EPI CURVE")

selects = ws.getSelectableItems()

dates = next(iter([
    t["values"]
    for t in selects
    if t["column"] == "ATTR(ILLNESS ONSET DATE)"
]))
print(dates)

for date in dates:
    print(date)
    wb = ws.select('ATTR(ILLNESS ONSET DATE)', date)
    print(wb.getWorksheet("BY COUNTY").data)

repl.it: https://replit.com/@bertrandmartel/TableauCovidNewJersey

Note that it's for the orange part of the date bar graph (confirmed cases) It actually gets both the orange part (confirmed cases) and blue part (probable cases) with distinct call for the 2 types (each date is duplicated but has a different index)

Also there is ATTR(ILLNESS ONSET DATE) and ILLNESS ONSET DATE, not sure which one to pick

rexdouglass commented 3 years ago

Confirmed. With 1.4 it selects and downloads at intended.

I noticed you shorten the URLs to exclude all the flags that they tack on. I noticed that now if I do not shorten it, ts.loads(url) fails. I'm not sure when/where this behavior was introduced and suggest either a warning or update to documentation to help users understand what kinds of URLs are preferred. Requiring the shortened version is perfectly fine.

Thank you again for very generous help.