bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
126 stars 20 forks source link

Scraping tableau data based on data filtered with dropdown boxes in non existent worksheet columns #55

Open R4V3NDS opened 2 years ago

R4V3NDS commented 2 years ago

I'm trying to scrape data from a tableau dashboard on this website

The package itself works amazingly, but I'd also like to filter the data (from the map (the data appears in the worksheet Mapa) by year (Año) and sex (Sexo) based on the dropdown boxes on the dashboard. I suspect that such filtering can be done through the setFilter() function with the argument dashboardFilter=True but am having trouble implementing it and/or am not quite understanding how it works.

I think the problem is because the names of the filters on the dashboard differ from the names of the columns in the actual worksheets and in some cases, do not even exist in the specific worksheet I'm looking for. Is there a way around this? I would be very grateful for any assistance or insight.

EDIT: The issue is very similar to https://github.com/bertrandmartel/tableau-scraping/issues/7 but I'm struggling to find the right fields to filter the data I would like (or am completely missing something)

Thankyou

from tableauscraper import TableauScraper as TS
import pandas as pd

url = "https://public.tableau.com/views/DashboardRegional_15811027307400/DashboardRegional?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublic.tableau.com%2F&:embed_code_version=3&:tabs=no&:toolbar=no&:animate_transition=yes&:display_static_image=no&:display_spinner=no&:display_overlay=yes&:display_count=yes&publish=yes&:loadOrderID=0"

ts = TS()
ts.loads(url)
wb=ts.getWorkbook()
sheetName = "Mapa"

ws = wb.getWorksheet(sheetName)
print(ws.data)
# I cannot filter by these categories (the drop down boxes in teh dashboard)
# wb = ws.setFilter("Sexo", "Hombres", dashboardFilter=True)
# ws = wb.getWorksheet(sheetName)
# wb = ws.setFilter("Año", 2020, dashboardFilter=True)
# ws = wb.getWorksheet(sheetName)
# print(ws.data)