Closed jbadelson closed 3 years ago
@jbadelson Hello, the website seems to be only available in the US. I've managed to run it in repl.it (based in US) and extract all metadata for analysis
After investigating, the problem comes from the filter index, but my guess is that table.schema[].ordinal
field seems to indicate the start filter index:
"schema": [{
"caption": "Region",
"collation": {
"f": 0,
"l": 4294967295
},
"column_class": 1,
"dataType": "s",
"family_name": "Extract",
"fieldType": "N",
"hidden": false,
"name": ["sqlproxy.10q6ok10kzhj6r13vx7j91uvn4mv", "Region"],
"ordinal": 1, <==================================================
"role": "d"
}],
I didn't have any example showing that this field was important until now. The 3 following setFilter
usecases have all ordinal
set to 0
:
https://replit.com/@bertrandmartel/TableauCovidNewHampshire https://replit.com/@bertrandmartel/TableauCovidSouthCarolina https://replit.com/@bertrandmartel/TableauFilter
I will implement the parsing of the ordinal
field in schema
object
I think it's more related to #5 because it deals with how filterJson
parsing is implemented. Indexing issues are major challenges in this library since the indexing is sparsed in different places. For example, filter
and select
have completly different indexing mecanism:
select
:
data
json dataisAutoSelect
field set to true
dataDictionary
field and indexing info in the paneColumnsData
fieldfilter
:
info
json datafiltersJson
and which has a tuples
field mapping keys/values@jbadelson I've released v0.1.9 which fix the index issue. But there is still something amiss when working with this url. I don't know if it's related to the filters, but it seems it doesn't return data (KeyError: 'dataDictionary'
) if I perform more than X setFilter
in a row. For instance:
from tableauscraper import TableauScraper as TS
url = 'https://analytics.la.gov/t/LDH/views/covid19_hosp_vent_reg/Hosp_vent_c'
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
ws = ts.getWorksheet('Hospitalization and Ventilator Usage')
regions = next(iter([
t["values"]
for t in ws.getFilters()
if t["column"] == "Region"
]))
print(regions)
for region in regions:
print(region)
wb = ws.setFilter('Region', region)
ws = wb.getWorksheet('Hospitalization and Ventilator Usage')
print(ws.data)
It fails at 5 - Southwest
(or sometimes 8 - Monroe
) but if I restart the session (with a new TS object) I get the result for 5 - Southwest
so the filter query seems correct.
Something is wrong with the session and I think it's related to how the dataDictionary
is managed between calls (maybe tableau server specific). I still need to investigate but you can workaround by re-instantiating the TS object for the moment if you need to iterate all the regions
This seems particularly unusual since https://replit.com/@bertrandmartel/TableauCovidNewHampshire#main.py for NewHampshire works well with more than 10 setFilter
in a row
@jbadelson Everything is fixed in v0.1.10. The following get all regions:
from tableauscraper import TableauScraper as TS
url = 'https://analytics.la.gov/t/LDH/views/covid19_hosp_vent_reg/Hosp_vent_c'
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
ws = ts.getWorksheet('Hospitalization and Ventilator Usage')
regions = next(iter([
t["values"]
for t in ws.getFilters()
if t["column"] == "Region"
]))
print(regions)
for region in regions:
print(region)
wb = ws.setFilter('Region', region)
regionWs = wb.getWorksheet('Hospitalization and Ventilator Usage')
print(regionWs.data)
repl.it: https://replit.com/@bertrandmartel/TableauCovid19Louisiana
Note that I've added a warning in case the response doesn't return any data. In this case, it's normal behaviour but most of the time, it would mean that something is wrong in the input or that there is a bug
Works perfectly now, thank you so much!
With regard to the sessions issue: I'm guessing that's something specifically related to how this agency set up their server. I hadn't thought about this until just now, but this chart and others hosted on the same server will often reset themselves when switching between filters in a browser.
Thank you again for coming up with a solution for both issues and for doing it so quickly.
First, thanks for putting this together. It's likely this problem is either user error on my part or weirdness in how the worksheet I'm trying to scrape is set up.
I'm trying to get filtered data from the https://analytics.la.gov/t/LDH/views/covid19_hosp_vent_reg/Hosp_vent_c. The options listed on the dropdown on the visualization are "(All)" and then the nine regions. Using getFilters() returns just the nine regions and when using setFilters() the data returned is for the prior region in the index.
For example. This code:
Returns the results for '1 - New Orleans' which is in the prior index position.
Setting the filter to the first region ('1 - New Orleans') results in KeyError: 'dataDictionary' when trying to access its data.
The issue appears to be similar to one addressed in Issue #6, except with filters rather than selections. In this case, getSelectableItems() returns the data displayed on the chart when "(All)" is selected.
I'm using tableauscraper version 0.1.8.