bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
126 stars 20 forks source link

Filtering doesn't work when filter-delta property is specified #27

Closed galalH closed 2 years ago

galalH commented 2 years ago

Hi Bertrand,

First of all, thanks for the great work behind this package!

I'm trying to retrieve the data from the USCIS Southwest Land Border Encounters by looping over all possible filter combinations in the worksheet. The problem is that I keep running into "no data dictionary present in response" warning messages when trying to set the individual filters to the desired combination even though the dashboard displays valid responses for the same filters on the web page. After a few warning messages, the scraper just dies with KeyError: 'workbookPresModel'.

My python's a bit rusty, so I'm calling your package from R using reticulate. Here's the code to reproduce the issue:

library(tidyverse)
library(reticulate)
library(rvest)

# py_install("tableauscraper", pip = TRUE)
baseurl <- "https://publicstats.cbp.gov/t/PublicFacing/views/"
ts <- import("tableauscraper")$TableauScraper()

dashboards <-
  session("https://www.cbp.gov/newsroom/stats/southwest-land-border-encounters") |>
  html_elements("param[value*='CBPSBOEnforcementActionsDashboardsJULFY21']") |>
  html_attr("value")

encountersdb <- str_c(baseurl, dashboards[1], "?:embed=y&:showVizHome=no")

ts$loads(encountersdb)
wb <- ts$getWorkbook()
ws <- wb$getWorksheet("SBO Line Graph")

combs <- ws$getFilters() |> (\(x) set_names(map(x, ~.$values), map_chr(x, ~.$column)))() |> cross_df()

data <-
  combs |>
  mutate(data =
           pmap(combs,
                function(...) {
                  params <- list(...)
                  wb <-
                    reduce2(names(params), params,
                            function(x, column, value) {
                              x$getWorksheet("SBO Line Graph")$setFilter(column, value)
                            },
                            .init = wb)
                  wb$getWorksheet("SBO Line Graph")$data |> as_tibble()
                }))

Would be appreciated if you could take the time to look into it. Or let me know if there's another way to accomplish the same result.

bertrandmartel commented 2 years ago

@galalH Thanks, there seems to be something missing in the filter-by-index implementation

One example in python would be:

from tableauscraper import TableauScraper as TS

url = "https://publicstats.cbp.gov/t/PublicFacing/views/CBPSBOEnforcementActionsDashboardsJULFY21/SBOEncounters7139?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublicstats.cbp.gov%2F&:embed_code_version=3&:tabs=no&:toolbar=yes&:showAppBanner=false&:showShareOptions=%E2%80%99false%E2%80%99&:display_spinner=no&:loadOrderID=0"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

ws = ts.getWorksheet("SBO Line Graph")
print(ws.data)

# get filters columns and values
filters = ws.getFilters()
print(filters)

# set filter value
wb = ws.setFilter('Citizenship Grouping', 'El Salvador')

# show the new data for worksheet
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

But doesn't return any data

The current implementation doesn't deal with filter-delta type which uses these params:

filterUpdateType: filter-delta
filterAddIndices: [1]
filterRemoveIndices: [0]

The current implementation always set filterUpdateType to filter-replace:

("filterIndices", (None, json.dumps(selection))),
("filterUpdateType", (None, "filter-replace"))

I need to look at the json result to understand how to set the filterUpdateType accordinglyand how to figure out filterAddIndices and filterRemoveIndices

rodrigues-pedro commented 2 years ago

I randomly get the same Warning, like 1 out of 20 tries it just doesn't get the data. Same filters applied, most times it work, but sometimes it doesn't.

It's kinda frustrating since I don't know what could be the cause.... This is the code right now

  url = "https://public.tableau.com/views/ExcessoObitos-confernciadedados/ExcessodebitosRevisado"
  ts = TS()

  i = 0
  df = pd.DataFrame()

  while i<=5 & df.empty:
      ts.loads(url)
      wb = ts.getWorkbook()
      ws = wb.getWorksheet("Curva UF")

      df = ws.data
      i += 1

      if df.empty:
          time.sleep(5*60)

Am I doing something wrong? Do you think is a problem with the tableau panel itself?

To contextualize a little bit, it's part of one indicator on a automatically generated report that runs every sunday. It's important that it is generated on sunday due to some indicators that relate to epidemiologic week.

bertrandmartel commented 2 years ago

@rodrigues-pedro could you create another issue about it ?

I've just implemented the filter-delta that checks the selected option in the filter configuration (if any selection exists) and removes those selection index and add the filter index:

selected

I've also added the possibility to input an array of filter to setFilter, it will correctly deal with multiple filter index when using filter-replace or filter delta

Example:

from tableauscraper import TableauScraper as TS

url = "https://publicstats.cbp.gov/t/PublicFacing/views/CBPSBOEnforcementActionsDashboardsJULFY21/SBOEncounters7139"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

ws = ts.getWorksheet("SBO Line Graph")
print(ws.data)

# get filters columns and values
filters = ws.getFilters()
print(filters)

# set filter value
wb = ws.setFilter('Citizenship Grouping', 'El Salvador', filterDelta=True)

# show the new data for worksheet
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

wb = ws.setFilter('Citizenship Grouping', 'Guatemala', filterDelta=True)

# show the new data for worksheet
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

wb = ws.setFilter('Citizenship Grouping', [
                  'Guatemala', 'El Salvador'], filterDelta=True)

# show the new data for worksheet
ws = wb.getWorksheet("SBO Line Graph")
print(ws.data)

Try this on repl.it

released in v0.1.21