bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
126 stars 20 forks source link

How to deal with server side rendered tableau data ? #13

Closed cg86x closed 3 years ago

cg86x commented 3 years ago

Thanks for the work to put the R and python scripts together for this. I'm relatively experienced in R but less so in Python - apologies if the below is relatively simple user error.

OS: Mac 11.0.1

Target dashboard: https://public.tableau.com/profile/football.observatory#!/vizhome/InstatIndexRanking/Instatindex

Alternative URL (redirects to the above, but is similar in structure to the examples you provided and did offer different results in R):

https://public.tableau.com/views/InstatIndexRanking/Instatindex

Issues in R:

Using the primary URL:

data <- body %>% 
+     html_nodes("textarea#tsConfigContainer") %>% 
+     html_text()

returns

character(0)

and nothing below that works as a result.

Using the alternate URL above, step by step the script seems to work ok until:

data <- fromJSON(extract[1,3])

Which results in:

> data
$secondaryInfo
list()

FWIW, data <- fromJSON(extract[1,2]) has tons of info in it (e.g. worksheet names, IDs, etc), but I couldn't find anything to fully satisfy needs lower down in the script.

In Python, unfortunately I can't offer much in the way of debugging, but with the alternate URL I get the below error.

Traceback (most recent call last):
  File "/Users/chris/Documents/tableau-scraping-master/scripts/tableau_specific_sheet.py", line 6, in <module>
    ts.loads(url)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tableauscraper/TableauScraper.py", line 81, in loads
    presModelMap = self.data["secondaryInfo"]["presModelMap"]
KeyError: 'presModelMap'

Thanks so much for any insight you can provide.

bertrandmartel commented 3 years ago

Hello,

In this case, it seems client side rendering has been disabled, it only uses server side rendering: https://help.tableau.com/current/server/en-us/browser_rendering.htm

You can't get the data when using server-side rendering since it only loads images with the data, and process click using coordinates in the JS.

This is quite frustrating since the documentation (link above) says that we can switch to client side rendering using :render=true but it doesn't work. I've tested the following but failed to enable client side rendering:

In additionnal to your tableau url, the following ones also have server side rendering (and thus you can't get the data)

For the last one, it's possible to get some data from the tooltip call that renders real data: https://stackoverflow.com/questions/63201650/how-can-i-extract-values-from-tableau-on-this-webpage/64191030#64191030 Maybe there are some data in your case that could be retrieved using some actions (like a filter or select)

cg86x commented 3 years ago

Thank you so much for looking into this, and so quickly.

Sounds like I might be out of luck with this particular dashboard. Is there any clear way to tell which tableau dashboards are using server side rendering?

bertrandmartel commented 3 years ago

@afc04 if you open the chrome devtool, network tab. each time you hover over something it will send a render-tooltip-server API call. This call is sent every time mouse hover event is triggered and it's sending it regardless if there is a tooltip or not.

tooltip

bertrandmartel commented 3 years ago

For your dashboard, the player filter data stats can be accessed:

player

But the table data can't be retrieved as is

It seems player filtering is not updating the table either. Only the player stats can be retrieved if you're interested

bertrandmartel commented 3 years ago

@afc04 It seems there is a way but very hacky to return the data using the filters in bottom right. When playing a lot with the filters, sometimes it returns the data with server mode and sometimes with client mode. I have no idea how the server is choosing when switching to client server mode. It's similar to the method in the stackoverflow link above but with another filters. eg iterate over all the filters and you'll have the full data.

For example, when you filter a team it renders using client rendering. But when you clear the filter, it doesn't gives you the data back (for all teams) because it already has the data (caching) and thus we could iterate over all teams and get the whole data like I do with the tooltip in https://stackoverflow.com/questions/63201650/how-can-i-extract-values-from-tableau-on-this-webpage/64191030#64191030

bertrandmartel commented 3 years ago

I think the first step would be to deal with case where there is nothing in secondary json object

bertrandmartel commented 3 years ago

I would also need to check on this tableau url: https://public.tableau.com/profile/decision.theater#!/vizhome/v_7_14_2020/COVID-19TestingCommons which has similar behaviour, some filter have client side rendering whereas the default is server side rendering. For the 3 other tableau url, this tricks don't work so for now, I'm taking this with a grain of salt. It may work for some cases and fail on others (regarding to server side rendering issues)

cg86x commented 3 years ago

@afc04 It seems there is a way but very hacky to return the data using the filters in bottom right. When playing a lot with the filters, sometimes it returns the data with server mode and sometimes with client mode. I have no idea how the server is choosing when switching to client server mode. It's similar to the method in the stackoverflow link above but with another filters. eg iterate over all the filters and you'll have the full data.

For example, when you filter a team it renders using client rendering. But when you clear the filter, it doesn't gives you the data back (for all teams) because it already has the data (caching) and thus we could iterate over all teams and get the whole data like I do with the tooltip in https://stackoverflow.com/questions/63201650/how-can-i-extract-values-from-tableau-on-this-webpage/64191030#64191030

Wow thanks for continuing to work on this. The tooltip option - with one request per team - seems like a viable option. Especially considering my current ultra hacky workaround is to screenshot, run through OCR, drop that in a csv and then run some fairly lengthy R script (which I wrote this evening) to clean it. So I've got plenty of manual work to do if there is no scraping solution. The point being that hacky solutions are certainly fine.

Do you have workable code that can grab it one team at a time?

bertrandmartel commented 3 years ago

@afc04 I was able to iterate over all the teams (524 teams including None) with success and grab all data for each one. I'm working on a release in the next hour. I still have few things to check with the modifications I've made and the other tableau URL I want to test

bertrandmartel commented 3 years ago

@afc04 I've release v0.1.11, you can use the following code to get all the teams data and merge the result in a pandas dataframe:

from tableauscraper import TableauScraper as TS
import pandas as pd

url = "https://public.tableau.com/views/InstatIndexRanking/Instatindex"

ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()

ws = workbook.getWorksheet("ranking")

teams = [
    t["values"]
    for t in ws.getFilters()
    if t["column"] == "Team"
][0]

pdList = []
for team in teams:
    print(f"team: {team}")
    teamResultWb = ws.setFilter("Team", team)
    df = teamResultWb.getWorksheet("ranking").data
    pdList.append(df)
    print(df)

result = pd.concat(pdList, ignore_index=True)
print(result)

repl.it: https://replit.com/@bertrandmartel/TableauCIESFootball

bertrandmartel commented 3 years ago

I've just answered this question using the same technique, except that it has a filter with only 7 values which is quicker to iterate than the 524 teams value in your filter.

In your case, the "select competition" filter could also have been used but it's a "filter-delta" type filter which means the values are already cached by the server (it works by dismissing one of the competition). Since the values are already cached, you can't get any data back when you deselect the filter. The "select positon" and "select range" are server side rendered so there only remains the team filter

cg86x commented 3 years ago

This is amazing - thanks so much for the time and effort to figure it out.

I'll run it when I'm back to my home computer. I'll come back to this thread if anything unusual pops today or running it in the future.

Thanks again - huge time saver and learning experience for me.