bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
126 stars 20 forks source link

setFilter() returning unexpected data #36

Closed CoderPJM closed 2 years ago

CoderPJM commented 2 years ago

Hi Bertrand

I am continuing with the scrape you kindly helped with on stack overflow but have come across some strange behavior. The following code sets parameters and a filter to display hourly generation for a power plant named 'ABAÚNA' but after setting the filter the data for 'AIMORÉS' is returned. Do I have to load an intermediate worksheet before applying the filter?

from tableauscraper import TableauScraper as TS

url = "https://tableau.ons.org.br/t/ONS_Publico/views/GeraodeEnergia/HistricoGeraodeEnergia"
ts = TS()

ts.loads(url)
wb = ts.getWorkbook()
ws = wb.getWorksheet("Simples Geração de Energia Barra Semana")

# Set resolution
print(f"Set resolution to Hora")
wb.setParameter("Escala de Tempo GE Simp 4", "Hora")

# Set the start date
print(f"Set start date to 01/01/2016")
wb.setParameter("Início Primeiro Período GE Simp 4",f"01/01/2016")

# Set the end date
print(f"Set end date to 01/01/2017")
wb = wb.setParameter("Fim Primeiro Período GE Simp 4",f"01/01/2017")

# Get plant names
usina = [
    t["values"]
    for t in ws.getFilters()
    if t["column"] == "USINACEG"
][0]

# Set plant name to 'ABAÚNA'
plantName=" ABAÚNA (CEG: CGH.PH.RS.000015-9.01)"
print(f"Set plant to {plantName}")
wb = ws.setFilter("USINACEG",plantName,filterDelta=True)

# Retrieve daily worksheet
ws = wb.getWorksheet("Simples Geração de Energia Dia")

# Show plants
print(ws.data)  # Contains data for 'AIMORÉS'
bertrandmartel commented 2 years ago

@CoderPJM Hello, it seems ABAÚNA (CEG: CGH.PH.RS.000015-9.01) filter is not available when Escala de Tempo GE Simp 4 filter is set to hora, dia and semana.

It seems that only mes and ano are available for ABAÚNA (CEG: CGH.PH.RS.000015-9.01)

hora, dia and semana: dia

ano and mes: ano

CoderPJM commented 2 years ago

Okay, so the list returned by getFilters() is not refreshed even though I have set the parameters? Am I missing a step to force a refresh of the plant names between # Set the end date and # Get plant names ?

bertrandmartel commented 2 years ago

@CoderPJM This is fixed in the latest release. The following code checks that the target filter is not present after the setParameter:

from tableauscraper import TableauScraper as TS

url = "https://tableau.ons.org.br/t/ONS_Publico/views/GeraodeEnergia/HistricoGeraodeEnergia"
ts = TS()

ts.loads(url)
wb = ts.getWorkbook()
ws = wb.getWorksheet("Simples Geração de Energia Barra Semana")

# Get plant names
usina = [
    t["values"]
    for t in ws.getFilters()
    if t["column"] == "USINACEG"
][0]
print(" ABAÚNA (CEG: CGH.PH.RS.000015-9.01)" in usina)

# Set resolution
print(f"Set resolution to Hora")
wb.setParameter("Escala de Tempo GE Simp 4", "Hora")

# Get plant names with filter refreshed
usina = [
    t["values"]
    for t in ws.getFilters()
    if t["column"] == "USINACEG"
][0]
print(" ABAÚNA (CEG: CGH.PH.RS.000015-9.01)" in usina)

print(f"Set resolution to Ano")
wb.setParameter("Escala de Tempo GE Simp 4", "Ano")

# Get plant names with filter refreshed
usina = [
    t["values"]
    for t in ws.getFilters()
    if t["column"] == "USINACEG"
][0]
print(" ABAÚNA (CEG: CGH.PH.RS.000015-9.01)" in usina)

Output

True
Set resolution to Hora
False
Set resolution to Ano
True
CoderPJM commented 2 years ago

Thank you Bertrand, however the setFilter() is still not returning the expected data. The revised code below (using version 0.1.24) still returns data for a different plant, even after setting a plant that is in the getFilter() list:

from` tableauscraper import TableauScraper as TS

url = "https://tableau.ons.org.br/t/ONS_Publico/views/GeraodeEnergia/HistricoGeraodeEnergia"
ts = TS()

ts.loads(url)
wb = ts.getWorkbook()
ws = wb.getWorksheet("Simples Geração de Energia Barra Semana")

# Set resolution
print(f"Set resolution to Hora")
wb.setParameter("Escala de Tempo GE Simp 4", "Hora")

# Set the start date
print(f"Set start date to 01/01/2016")
wb.setParameter("Início Primeiro Período GE Simp 4",f"01/01/2016")

# Set the end date
print(f"Set end date to 01/01/2017")
wb = wb.setParameter("Fim Primeiro Período GE Simp 4",f"01/01/2017")

# Get plant names
usina = [
    t["values"]
    for t in ws.getFilters()
    if t["column"] == "USINACEG"
][0]

# Set plant name to second plant in list (currently ' AIMORÉS (CEG: UHE.PH.MG.000042-6.01)')
plantName=usina[1]
print(f"Set plant to {plantName}")
wb = ws.setFilter("USINACEG",plantName)

# Retrieve daily worksheet
ws = wb.getWorksheet("Simples Geração de Energia Dia")

# Show plants
print(ws.data)  # Contains data for 'PAU FERRO I'

Sorry to be a pain!

bertrandmartel commented 2 years ago

@CoderPJM The issue was the ordinal property that was used to build the filter index. Now, it only uses the position of the filter in the array as filter index. This is released in v0.1.25

CoderPJM commented 2 years ago

Amazing, thanks again for such a fast response. Swapped to the latest version and it worked perfectly.