Python library to scrape data from Tableau viz
R library is under development but a script is available to get the worksheets, see this
pip install TableauScraper
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
for t in workbook.worksheets:
print(f"worksheet name : {t.name}") #show worksheet name
print(t.data) #show dataframe for this worksheet
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"
ts = TS()
ts.loads(url)
ws = ts.getWorksheet("ATT MID CREATIVE COMP")
print(ws.data)
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"
ts = TS()
ts.loads(url)
ws = ts.getWorksheet("ATT MID CREATIVE COMP")
# show selectable values
selections = ws.getSelectableItems()
print(selections)
# select that value
dashboard = ws.select("ATTR(Player)", "Vinicius Júnior")
# display worksheets
for t in dashboard.worksheets:
print(t.data)
Get list of parameters with workbook.getParameters()
and set parameter value using workbook.setParameter("column_name", "value")
:
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
# show parameters values / column
parameters = workbook.getParameters()
print(parameters)
# set parameters column / value
workbook = workbook.setParameter("P.League 2", "Ligue 1")
# display worksheets
for t in workbook.worksheets:
print(t.data)
It's possible to override the parameter name used in the API requests using inputParameter
, which is different from the input name:
wb = wb.setParameter(inputName=None, value="Ligue 1",
inputParameter="[Parameters].[P.League (copy)_1642969456470679625]")
Get list of filters with worksheet.getFilters
and set filter value using worksheet.setFilter("column_name", "value")
:
from tableauscraper import TableauScraper as TS
url = 'https://public.tableau.com/views/WomenInOlympics/Dashboard1'
ts = TS()
ts.loads(url)
# show original data for worksheet
ws = ts.getWorksheet("Bar Chart")
print(ws.data)
# get filters columns and values
filters = ws.getFilters()
print(filters)
# set filter value
wb = ws.setFilter('Olympics', 'Winter')
# show the new data for worksheet
countyWs = wb.getWorksheet("Bar Chart")
print(countyWs.data)
You can specify dashboardFilter=True
in order to use dashboard-categorical-filter
API instead of categorical-filter-by-index
API (related)
When using dashboardFilter=True
you can skip the filter value check usin noCheck=True
(related)
You can discard membershipTarget
property from being sent in setFilter
using setFilter('COLUMN','VALUE', membershipTarget=False)
(related)
You can specify multiple filters for filters that enable that feature using setFilter('COLUMN', ['VALUE1','VALUE2'])
You can specify a "filter-delta" filter type adding the parameter filterDelta=True
like the following setFilter('COLUMN','VALUE', filterDelta=True)
. This will discard all filters and add the one corresponding to ['VALUE']
in this case. This is helpful when all or some filters are selected by default, and you want to unselect them. The default behaviour (filterDelta=False
) is filter-replace
which sometimes doesn't work when filter multi-selection is possible in the dashboard. example
In last recourse, you can use indexValues
property to directly specify the indices (if there is a bug in the library or anything comes up): setFilter('COLUMN', [], indexValues=[0,1,2])
Some Tableau dashboard have storypoints where you can navigate. To list the storypoints and go to a specific storypoints:
from tableauscraper import TableauScraper as TS
url = 'https://public.tableau.com/views/EarthquakeTrendStory2/Finished-Earthquakestory'
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()
print(wb.getStoryPoints())
print("go to specific storypoint")
sp = wb.goToStoryPoint(storyPointId=10)
print(sp.getWorksheetNames())
print(sp.getWorksheet("Timeline").data)
On some graph/table, there is a drill up/down feature used to zoom in or out data like this
from tableauscraper import TableauScraper as TS
url = 'https://tableau.azdhs.gov/views/ELRv2testlevelandpeopletested/PeopleTested'
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()
sheetName = "P1 - Tests by Day W/ % Positivity (Both) (2)"
drillDown1 = wb.getWorksheet(sheetName).levelDrill(drillDown=True, position=1)
drillDown2 = drillDown1.getWorksheet(sheetName).levelDrill(drillDown=True, position=1)
drillDown3 = drillDown2.getWorksheet(sheetName).levelDrill(drillDown=True, position=1)
print(drillDown1.getWorksheet(sheetName).data)
print(drillDown2.getWorksheet(sheetName).data)
print(drillDown3.getWorksheet(sheetName).data)
The position
parameter is default to 0
. It doesn't seem to be present in the json configuration. If the default is not working try incrementing it or checkout the network tabs using Chrome devtools.
For Tableau URL that have the download feature enabled, you can download full data using:
from tableauscraper import TableauScraper as TS
url = 'https://public.tableau.com/views/WYCOVID-19Dashboard/WyomingCOVID-19CaseDashboard'
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()
data = wb.getCsvData(sheetName='case map')
print(data)
Note that in some Tableau server, the prefix used in the API url is different. As it's set in the javascript, it must be set manually if it's not the same as public.tableau.com like:
wb.getCsvData(sheetName='worksheet1', prefix="vud")
The prefix values, I've encountered are: vud
and vudcsv
. The default is vudcsv
.
For Tableau URL that have the crosstab feature enabled, you can download the crosstab using:
from tableauscraper import TableauScraper as TS
url = "https://tableau.soa.org/t/soa-public/views/USPostLevelTermMortalityExperienceInteractiveTool/DataTable2"
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()
wb.setParameter(inputName="Count or Amount", value="Amount")
data = wb.getCrossTabData(
sheetName="Data Table 2 - Premium Jump & PLT Duration")
print(data)
Get list of all sheets with subsheets visible or invisible, ability to send a go-to-sheet command (dashboar button) :
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/COVID-19VaccineTrackerDashboard_16153822244270/Dosesadministered"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
sheets = workbook.getSheets()
print(sheets)
nycAdults = workbook.goToSheet("NYC Adults")
for t in nycAdults.worksheets:
print(f"worksheet name : {t.name}") # show worksheet name
print(t.data) # show dataframe for this worksheet
Get the tooltip html output when render-tooltip-server
API is called. This is particularly useful when dealing with server side rendering dashboard:
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/CMI-2_0/CMI"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
ws = workbook.getWorksheet("US Map - State - CMI")
tooltipHtml = ws.renderTooltip(x=387, y=196)
print(tooltipHtml)
If the tableau url you're working on is using server side rendering, data can't be extracted as is.
You can checkout if your tableau url is using server side rendering by opening chrome development console / network tab. You would notice the API calls have renderMode
properties set to render-mode-server
:
Server side rendering means that no data is sent to the browser. Instead, the server is rendering the tableau chart using images only and detects selection using mouse coordinates.
To extract the data, one thing that has worked with some tableau url was to trigger a specific filter that is not server-side-rendered. You can checkout the network tab on Chrome development console to check if the filter call is using or not server-side rendering or client-side-rendering with renderMode
:
If the filter is only using client side rendering, you can list all filters and perform the filter for each value. This technique only works if the tableau data has "cleared" the filter by default otherwise the data is already cached when the tableau data is loaded, and since it's using server side rendering you can't access this data
Checkout the following repl.it for examples with tableau url using server side rendering:
To discover all worksheets, selectable columns and dropdowns, run prompt.py
script under scripts
directory :
git clone git@github.com:bertrandmartel/tableau-scraping.git
cd tableau-scraping/scripts
#get worksheets data
python3 prompt.py -get workbook -url "https://public.tableau.com/views/PlayerStats-Top5Leagues20192020/OnePlayerSummary"
#select a selectable item
python3 prompt.py -get select -url "https://public.tableau.com/views/MKTScoredeisolamentosocial/VisoGeral"
#set a parameter
python3 prompt.py -get parameter -url "https://public.tableau.com/views/COVID-19DailyDashboard_15960160643010/Casesbyneighbourhood"
TableauScraper
class has the following optional parameters :
Parameters | default value | description |
---|---|---|
logLevel | logging.INFO | log level |
delayMs | 500 | minimum delay in millis between api calls |
under R
directory :
Rscript tableau.R
R library is under development