bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
126 stars 20 forks source link

Issue getting filters, loading some sheets with server side rendering #25

Closed adamsolMIT closed 2 years ago

adamsolMIT commented 2 years ago

I am attempting to get the underlying data (if possible) of which summary tables and graphs are presented in the workbook [here](https://tableau.soa.org/t/soa-public/views/USPostLevelTermMortalityExperienceInteractiveTool/3_PLTDuration?%3AisGuestRedirectFromVizportal=y&%3Aembed=y).

Loosely following your TableauCIESFootball code, I start like this:

from tableauscraper import TableauScraper as TS

url='https://tableau.soa.org/t/soa-public/views/USPostLevelTermMortalityExperienceInteractiveTool/DataTable3?%3AisGuestRedirectFromVizportal=y&%3Aembed=y'
wb = TS()
wb.loads(url)

workbook = wb.getWorkbook()
sheets = workbook.getSheets()
print(sheets)

#data2=s.goToSheet("Data Table 2")
data3=workbook.goToSheet("3. PLT Duration")
filters3 = data3.getFilters() 

data3V2=workbook.getWorksheet("3. PLT Duration")
filters3V2 = data3V2.getFilters() 

Issues:

  1. The line: #data2=s.goToSheet("Data Table 2") doesn't work, despite there being a worksheet with exactly that name.
  2. I can run the .goToSheet("3. PLT Duration") command but no the getWorksheet("3. PLT Duration").
  3. After running goToSheet, the getFilters() returns an error.

So I haven't been able to even get started with the data. But my overall objective to somehow get the granular underlying data of which the worksheets present amalgamated summaries.

Thanks for your help!

bertrandmartel commented 2 years ago

For the series of issues:

The line: #data2=s.goToSheet("Data Table 2") doesn't work, despite there being a worksheet with exactly that name.

It triggered an error because dataSegments field was missing in the response, which is normal since it's using server side rendering for this sheet. I've fixed this error

After running goToSheet, the getFilters() returns an error.

The filters are in fact parameters (at the top level whereas filters are specific to sheets):

from tableauscraper import TableauScraper as TS

url = "https://tableau.soa.org/t/soa-public/views/USPostLevelTermMortalityExperienceInteractiveTool/DataTable3"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

for t in wb.worksheets:
    print(t.name)
    print(t.data)

wb = wb.goToSheet("Data Table 2")

print(wb.getParameters())

Note that you can use the sheet name in the url:

https://tableau.soa.org/t/soa-public/views/USPostLevelTermMortalityExperienceInteractiveTool/DataTable2

and you get the worksheet directly without goToSheet

bertrandmartel commented 2 years ago

For getting the data, it seems there is no filter (parameter) that is cleared by default (even the last one seems to server side render some values). But, it's still possible to get the data via the download button as described :

data

The above can be done programatically but it's not implemented for the "crosstab" only the "data" button is implemented. That would be a new feature to implement with the api:

This way you would reproduce the same flow as described above without using any filtering trick

adamsolMIT commented 2 years ago

Thanks for this. I think I follow. I was hoping there is some way to get at the more granular data (that is rendered and amalgamated server-side). Is this simply impossible?

bertrandmartel commented 2 years ago

@adamsolMIT it depends on the filter you want to set, some filters result in data rendered client side and other are server side rendered.

The crosstab csv result seems to be what you would expect isn't it ?

adamsolMIT commented 2 years ago

@bertrandmartel when each row contains exposures of the order of 10,000 or 100,000 it means that the raw data contains 10,000/100,000 entries that are being summed to produce this table. Nowhere on this workbook are the individual entries listed, but I was hoping it could be accessed somehow.

bertrandmartel commented 2 years ago

You can only extract data that is displayed on the webview. If the filters don't allow you to unwind/extract all those individual entries, it will not be possible to extract it programatically. I can help you extract only data that is visible in the webview. If it's not possible via filters, it could have been possible (and that's not certain) to get it via the "download" feature which is disabled for this dashboard and require authentication it seems

bertrandmartel commented 2 years ago

@adamsolMIT The modifications above are available in v0.1.19. If this is of interest, you can get the data (only the visible data) using:

from tableauscraper import TableauScraper as TS

url = "https://tableau.soa.org/t/soa-public/views/USPostLevelTermMortalityExperienceInteractiveTool/DataTable2"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

wb = wb.setParameter(inputName="Count or Amount", value="Amount")

data = wb.getCrossTabData(sheetName="Data Table 2 - Premium Jump & PLT Duration")

print(data)
adamsolMIT commented 2 years ago

Thanks @bertrandmartel !