bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
131 stars 21 forks source link

forcing client rendering with :render=true #16

Closed sethvincent closed 3 years ago

sethvincent commented 3 years ago

:wave:

It looks like the tableau client application will do something like detect the :render=true query param and then make requests to a different command endpoint (select-region-no-return-server) to get the data back in a format for client rendering.

Not sure how best to incorporate this but maybe a clientRender option in TableauScraper that adds the query param to the initial request and then uses select-region-no-return-server instead of select.

Example url:

https://dashboards.doh.nj.gov/vizql/w/DailyConfirmedCaseSummary7_22_2020/v/ConfirmedCases/sessions/98290EE86BE74F8A99259334C27502E2-1:0/commands/tabsrv/select-region-no-return-server

I found this while toying with a node.js port of this library and using some of the functions with playwright to observe how things were working. I'd be happy to collaborate! Getting a scraper working for the dashboard I'm currently focused on has been a wild ride, and am quite grateful for your work on this project.

bertrandmartel commented 3 years ago

@sethvincent Hello, sorry for the late reply, I didn't notice data rendered server side when I go to the dashboard: https://dashboards.doh.nj.gov/#/views/DailyConfirmedCaseSummary7_22_2020/ConfirmedCases.

Which selectable data do you want to retrieve ?

bertrandmartel commented 3 years ago

@sethvincent I think I've already encountered this call, it's present in server side rendered tableau url like this one: https://public.tableau.com/views/CMI-2_0/CMI?:showVizHome=no when you click on the map or in the table.

The thing is that you need to send the mouse coordinate in the body like this:

worksheet: US Map - State - CMI dashboard: CMI vizRegionRect: {"x":620,"y":85,"w":0,"h":0,"r":"viz"} mouseAction: simple zoneId: 248 zoneSelectionType: replace

There are images that are rendered in the html like this:

image

It would be difficult in this case to iterate over all the states by their respective x/y position (not impossible though)

The same applies for the tables which are rendered as images like:

image image

We would need to guess the x/y position.

For now, the safest (and not too hacky) trick to get data from server side rendered tableau url is to search for filter requests that return actually client side rendered data like in #13. Also documented here. A good example is also this: https://stackoverflow.com/questions/66238702/how-can-i-extract-values-from-tableau-dashboard/67222561#67222561

But sometimes, there is no filter set by default or the filtered result is always server side rendered, in this case I don't know how we can get the data without OCR or other image analysis techniques.

In your case, I'm pretty confident you can access the data directly, let me know if you're stuck with a server side rendered url