bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
126 stars 20 forks source link

scraping workbook, NOT worksheet with selectables #62

Open federicoscabbia opened 2 years ago

federicoscabbia commented 2 years ago

Hi everyone, I think i have found a tableau that has selectables not in the specific worksheet but in the workbook instead and i do not know how to scrape all the years. I tried this code to select the different years but when i look for the filters it gives back an empty list. Probably the the filtes are set at the workbook level and not at the worksheet level, however 'TableauWorkbook' object has no attribute 'getFilters'.

Could you please help me with that?

This is the table i'm referring to: https://public.tableau.com/views/SocialistPartyScandinavianFederation/Story1

And this is the code I used to scrape the data (as you can see i can only scrape the data from 1916 as it is set as the default year in the table).

from tableauscraper import TableauScraper as TS

url= "https://public.tableau.com/views/SocialistPartyScandinavianFederation/Story1"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

for t in workbook.worksheets:
    print(f"worksheet name : {t.name}") #show worksheet name
    print(t.data) #show dataframe for this worksheet

ws = ts.getWorksheet("tab1")

# show selectable values
print(ws.getSelectableItems())  # does not show years
print(ws.getFilters()) #empty
print(wb.getParameters())  #empty
djay commented 1 year ago

@federicoscabbia I have the same problem. I wrote this code to get around it. It handles both the case where there is no worksheet (dashboardFilter) and when the filter doesn't list the values but still accepts them.

def force_setFilter(wb, ws_name, columnName, values):
    "setFilter but ignore the listed filter options. also gets around wrong ordinal value which makes index value incorrect"
    # TODO: remove if they fix https://github.com/bertrandmartel/tableau-scraping/issues/50

    scraper = wb._scraper
    tableauscraper.api.delayExecution(scraper)
    ws = next((ws for ws in wb.worksheets if ws.name == ws_name), None)

    filter = next((
        {
            "globalFieldName": t["globalFieldName"],
        }
        for t in (ws.getFilters() if ws is not None else [])
        if t["column"] == columnName
    ), None
    )

    try:
        if filter is None:
            resp =  tableauscraper.api.dashboardFilter(scraper, columnName, [values] if not isinstance(values, list) else values)
        else:
            payload = (
                ("dashboard", scraper.dashboard),
                ("globalFieldName", (None, filter["globalFieldName"])),
                ("qualifiedFieldCaption", (None, columnName)),
                ("membershipTarget", (None, "filter")),
                ("exclude", (None, "false")),
                ("filterValues", (None, json.dumps(values))),
                ("filterUpdateType", (None, "filter-replace"))
            )
            r = scraper.session.post(
                f'{scraper.host}{scraper.tableauData["vizql_root"]}/sessions/{scraper.tableauData["sessionid"]}/commands/tabdoc/dashboard-categorical-filter',
                files=payload,
                verify=scraper.verify
            )
            scraper.lastActionTime = time.time()

            if r.status_code >= 400:
                raise requests.exceptions.RequestException(r.content)
            resp = r.json()
            errors = [
                res['commandReturn']['commandValidationPresModel']['errorMessage']
                for res in resp['vqlCmdResponse']['cmdResultList']
                if not res['commandReturn'].get('commandValidationPresModel', {}).get('valid', True)
            ]
            if errors:
                wb._scraper.logger.error(str(", ".join(errors)))
                raise tableauscraper.api.APIResponseException(", ".join(errors))

        wb.updateFullData(resp)
        return tableauscraper.dashboard.getWorksheetsCmdResponse(scraper, resp)
    except ValueError as e:
        scraper.logger.error(str(e))
        return tableauscraper.TableauWorkbook(
            scraper=scraper, originalData={}, originalInfo={}, data=[]
        )
    except tableauscraper.api.APIResponseException as e:
        wb._scraper.logger.error(str(e))
        return tableauscraper.TableauWorkbook(
            scraper=scraper, originalData={}, originalInfo={}, data=[]
        )
david-hammond commented 1 year ago

@djay This is perfect, can you help me with this? I tried your function but I think I am using it wrong

from tableauscraper import TableauScraper as TS
import pandas as pd

#copied and pasted you function in

def force_setFilter(wb, ws_name, columnName, values): "setFilter but ignore the listed filter options. also gets around wrong ordinal value which makes index value incorrect" # TODO: remove if they fix https://github.com/bertrandmartel/tableau-scraping/issues/50 scraper = wb._scraper tableauscraper.api.delayExecution(scraper) ws = next((ws for ws in wb.worksheets if ws.name == ws_name), None) filter = next(( { "globalFieldName": t["globalFieldName"], } for t in (ws.getFilters() if ws is not None else []) if t["column"] == columnName ), None ) try: if filter is None: resp = tableauscraper.api.dashboardFilter(scraper, columnName, [values] if not isinstance(values, list) else values) else: payload = ( ("dashboard", scraper.dashboard), ("globalFieldName", (None, filter["globalFieldName"])), ("qualifiedFieldCaption", (None, columnName)), ("membershipTarget", (None, "filter")), ("exclude", (None, "false")), ("filterValues", (None, json.dumps(values))), ("filterUpdateType", (None, "filter-replace")) ) r = scraper.session.post( f'{scraper.host}{scraper.tableauData["vizql_root"]}/sessions/{scraper.tableauData["sessionid"]}/commands/tabdoc/dashboard-categorical-filter', files=payload, verify=scraper.verify ) scraper.lastActionTime = time.time() if r.status_code >= 400: raise requests.exceptions.RequestException(r.content) resp = r.json() errors = [ res['commandReturn']['commandValidationPresModel']['errorMessage'] for res in resp['vqlCmdResponse']['cmdResultList'] if not res['commandReturn'].get('commandValidationPresModel', {}).get('valid', True) ] if errors: wb._scraper.logger.error(str(", ".join(errors))) raise tableauscraper.api.APIResponseException(", ".join(errors)) wb.updateFullData(resp) return tableauscraper.dashboard.getWorksheetsCmdResponse(scraper, resp) except ValueError as e: scraper.logger.error(str(e)) return tableauscraper.TableauWorkbook( scraper=scraper, originalData={}, originalInfo={}, data=[] ) except tableauscraper.api.APIResponseException as e: wb._scraper.logger.error(str(e)) return tableauscraper.TableauWorkbook( scraper=scraper, originalData={}, originalInfo={}, data=[] )

#then tried to use it in this way for this site

url = "https://public.tableau.com/views/DashboardRegional_15811027307400/DashboardRegional?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublic.tableau.com%2F&:embed_code_version=3&:tabs=no&:toolbar=no&:animate_transition=yes&:display_static_image=no&:display_spinner=no&:display_overlay=yes&:display_count=yes&publish=yes&:loadOrderID=0"

ts = TS()
ts.loads(url)
wb=ts.getWorkbook()
sheetName = "Mapa"

#I tried this using your function
wb = force_setFilter(wb, "Mapa", "Sexo", "Hombre")
ws = wb.getWorksheet(sheetName)
djay commented 1 year ago

I haven't tried it yet but I think you can actually pick any reandom worksheet and do serFilter with dashboardFilter=True and check=False

djay commented 1 year ago

For my function you pass None for worksheet name to use global filter

On Wed, 12 Oct 2022, 6:25 pm david-hammond, @.***> wrote:

@djay https://github.com/djay This is perfect, can you provide a working example for this site?

from tableauscraper import TableauScraper as TS import pandas as pd

url = "https://public.tableau.com/views/DashboardRegional_15811027307400/DashboardRegional?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublic.tableau.com%2F&:embed_code_version=3&:tabs=no&:toolbar=no&:animate_transition=yes&:display_static_image=no&:display_spinner=no&:display_overlay=yes&:display_count=yes&publish=yes&:loadOrderID=0"

ts = TS() ts.loads(url) wb=ts.getWorkbook() sheetName = "Mapa"

I tried this using your function

wb = force_setFilter(wb, "Mapa", "Sexo", "Hombre") ws = wb.getWorksheet(sheetName)

— Reply to this email directly, view it on GitHub https://github.com/bertrandmartel/tableau-scraping/issues/62#issuecomment-1276004912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKFZDZESLN4DVL3OYND6TWC2NZJANCNFSM5VNRRVEQ . You are receiving this because you were mentioned.Message ID: @.***>

david-hammond commented 1 year ago

@djay Brilliant, got it working, thanks so much!

miriamventurini commented 1 year ago

Did anyone try this solution on the tableau from the original question by @federicoscabbia ?