bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
126 stars 20 forks source link

How to set multiple filters #37

Closed mdgis closed 2 years ago

mdgis commented 2 years ago

I'm trying to pull down CBP data from this url. This works fine for pulling the data unfiltered or for a single filter. But i'm struggling to find a way to set up multiple filters at once.

When i try to set multiple filters i get:

2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response
2021-10-07 09:11:12,519 - tableauScraper - WARNING - no data dictionary present in response

The code I'm using:

url = (
    "https://publicstats.cbp.gov/t/PublicFacing/views/"
    "CBPSBOEnforcementActionsDashboardsAUGFY21/"
    "SBOEncounters8076?:isGuestRedirectFromVizportal=y&:embed=y"
)
ts = TS()
ts.loads(url)

ws = ts.getWorksheet("SBO Line Graph")
# Set First Filter 
wb = ws.setFilter("Citizenship Grouping", 'El Salvador')
# Grabbing Sheet Again 
ws = wb.getWorksheet("SBO Line Graph")
# Setting Second Filter 
wb = ws.setFilter("Demographic",'Single Adults')

#Grab the data
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

Appreciate any help here - many thanks

bertrandmartel commented 2 years ago

@mdgis You need to use filterDelta=True with this one, but it seems it fails to get the current selection so it returns all data

bertrandmartel commented 2 years ago

It seems it considers all fields to be selected if all property is set to true or if allChecked is true, regardless of the selected field in the filter table. working on a fix

mdgis commented 2 years ago

Whats really confusing is that this was working for me

from tableauscraper import TableauScraper as TS

url = (
    "https://publicstats.cbp.gov/t/PublicFacing/views/"
    "CBPSBOEnforcementActionsDashboardsAUGFY21/"
    "SBOEncounters8076?:isGuestRedirectFromVizportal=y&:embed=y"
)
ts = TS()
ts.loads(url)

ws = ts.getWorksheet("SBO Line Graph")
all_filters = ws.getFilters()
print(ws.getFilters()[1])

# Set First Filter 
wb = ws.setFilter("Citizenship Grouping", 'El Salvador')
# Grabbing Sheet Again 
ws = ts.getWorksheet("SBO Line Graph") # <- changed this to ts
all_filters = ws.getFilters()

# Setting Second Filter 
wb = ws.setFilter("Demographic",'Accompanied Minors')

#Grab the data
ws = wb.getWorksheet("SBO Table")
print(ws.data)

But then after running a few times it stops working and again returns the ...no data dictionary present in response

bertrandmartel commented 2 years ago

Another issue is the ordinal offset for the filter index. I thought the ordinal property was taken into account in the index calculation. But it seems not in this case. Issue is now the same as #36 . still investigating...

bertrandmartel commented 2 years ago

@mdgis This is now fixed in v0.1.25

from tableauscraper import TableauScraper as TS

ts = TS()
ts.loads("https://publicstats.cbp.gov/t/PublicFacing/views/CBPSBOEnforcementActionsDashboardsAUGFY21/SBOEncounters8076")
ws = ts.getWorksheet("SBO Line Graph")

print("filters: El Salvador")
wb = ws.setFilter("Citizenship Grouping", 'El Salvador', filterDelta=True)
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

print("filters: El Salvador / Single Adults")
wb = ws.setFilter("Demographic", 'Single Adults', filterDelta=True)
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

# note that you can also specify an array of filters
print("filters: [El Salvador,Guatemala] / Single Adults")
wb = ws.setFilter("Citizenship Grouping", [
                  'El Salvador', 'Guatemala'], filterDelta=True)
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

https://replit.com/@bertrandmartel/TableauPublicStatsUSCustoms

mdgis commented 2 years ago

Amazing - from couple of quick tests this looks like it solved it. Going to do an exhaustive run to get all the data just to double check - thanks so much !!!!

mdgis commented 2 years ago

Only issue i've found is that it can fail depending on the order of filters and whether a previous filter results in a single option in another filter.

For instance the sequence below will fail:

from tableauscraper import TableauScraper as TS

ts = TS()
ts.loads(
    "https://publicstats.cbp.gov/t/PublicFacing/views/CBPSBOEnforcementActionsDashboardsAUGFY21/SBOEncounters8076"
)
ws = ts.getWorksheet("SBO Line Graph")

print("filters: Component")
wb = ws.setFilter("Component", "Office of Field Operations", filterDelta=True)
ws = wb.getWorksheet("SBO Line Graph")

print("Demographic")
wb = ws.setFilter("Demographic", "FMUA", filterDelta=True)
ws = wb.getWorksheet("SBO Line Graph")

print("filters:  Title 8")
wb = ws.setFilter("Title of Authority", "Title 8", filterDelta=True)
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

This will not return any data - but if you use the actual tableau dash there is data. I think what happens is that when the FMUA filter is applied, then the only option for the rest of the data is Title 8 - but if you try to apply that Title 8 filter you won't get data back.

If we change the order of the filters to below, we do get back the proper data.


ts = TS()
ts.loads(
    "https://publicstats.cbp.gov/t/PublicFacing/views/CBPSBOEnforcementActionsDashboardsAUGFY21/SBOEncounters8076"
)
ws = ts.getWorksheet("SBO Line Graph")

print("filters: Component")
wb = ws.setFilter("Component", "Office of Field Operations", filterDelta=True)
ws = wb.getWorksheet("SBO Line Graph")

print("filters:  Title 8")
wb = ws.setFilter("Title of Authority", "Title 8", filterDelta=True)
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

print("Filters: Demographic")
wb = ws.setFilter("Demographic", "FMUA", filterDelta=True)
ws = wb.getWorksheet("SBO Line Graph")

Any thoughts on this ? Many thanks

bertrandmartel commented 2 years ago

@mdgis I see, this is an index issue again, but for the index of the current selection this time. For some reason when selecting Title8 it gives

filterAddIndices: [0]
filterRemoveIndices: [1]

[1] indicates there is more than 1 value although there is only Title 8 in the filter list (maybe "All" is considered a value). There is also the same indexing issue in Office of Field Operations and FMUA. working on it

bertrandmartel commented 2 years ago

@mdgis It seems the vizdata is not present in the response, this is why it doesn't get the data although the data dictionary is up to date. I will need to persist the data (column indices) for all zones that have existing id but don't have vizdata

mdgis commented 2 years ago

Awesome thanks again - I could also just search over the combinations until successful but would be great to get it the first time.

bertrandmartel commented 2 years ago

@mdgis This is now fixed in the latest release. This issue was also present in Tableau Covid dashboard with US county maps, when iterating the counties, those which had the same values (consecutively) had empty dataframe, just like your example

mdgis commented 2 years ago

Amazing - thank you so much ! - running tests now

mdgis commented 2 years ago

Yup looks good - this is awesome! Really appreciate the help.