bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
126 stars 20 forks source link

Filter data from dropdown #26

Closed m2studio closed 2 years ago

m2studio commented 2 years ago

Actually I'm not sure what should I name the topic, but this is what I want, I would like to scrape covid19 data from Thailand from this website : https://ddc.moph.go.th/covid19-dashboard/?dashboard=province

You might see on the website that host the embedded Tableau dashboard can filter both date (calendar one) and province (เลือกจังหวัด << select province in Thai) but I cannot filter it by using Python code.

I use this code in order to scrape see the vaccinated number from a specific dat but I have no idea how can I specify the province. Once I call wb.getParameters() , there is no province option.

I might not know your library well. Is there a way to specify both province and date ? Please hlep

This is my code you can try

from tableauscraper import TableauScraper as TS
province = 'กรุงเทพมหานคร' # this is Bangkok in Thai
url = "https://public.tableau.com/views/SATCOVIDDashboard/2-dash-tiles-province?:showVizHome=no&province=" + province
#url = "https://ddc.moph.go.th/covid19-dashboard/?dashboard=province"
print(url)
print(f"getting data from province : {province}")
ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

parameters = wb.getParameters()
print(parameters)

sheetName = 'D2_Vac1Today'  # sheet that represent vaccinated people on the specific day

# show dataframe with yearly data
ws = wb.getWorksheet(sheetName)
print(ws.data)

# change date
wb = wb.setParameter('param_date', '2021-08-16')
ws = wb.getWorksheet(sheetName)
print(ws.data)

# change date
wb = wb.setParameter('param_date', '2021-08-15')

ws = wb.getWorksheet(sheetName)
print(ws.data)
m2studio commented 2 years ago

I did not look at your code yet, but I guess the setParameter method will send POST request to this endpoint : https://public.tableau.com{vizql_root}/sessions/{sessionid}/commands/tabdoc/set-parameter-value

I also did try to filter the province by sending POST request to this endpoint : https://public.tableau.com{vizql_root}/sessions/{sessionid}/commands/tabdoc/dashboard-categorical-filter

However sometimes I get the data I want, sometimes I don't get it (without error in both http status and response command in the JSON), I have no idea how Tableau can update the dashboard without showing the data (they might hide it somewhere or I might be not smart enough)

bertrandmartel commented 2 years ago

@m2studio sorry for the late response. You are correct, the /commands/tabdoc/dashboard-categorical-filter api is not implemented currently. I will look at this asap

m2studio commented 2 years ago

@m2studio sorry for the late response. You are correct, the /commands/tabdoc/dashboard-categorical-filter api is not implemented currently. I will look at this asap

thanks, btw I really like your library, it's easy to use and has a lot of examples to follow. Thumps up!

djay commented 2 years ago

I have code for this

def force_setFilter(wb, ws_name, columnName, values):
    "setFilter but ignore the listed filter options. also gets around wrong ordinal value which makes index value incorrect"

    scraper = wb._scraper
    tableauscraper.api.delayExecution(scraper)
    ws = next(ws for ws in wb.worksheets if ws.name == ws_name)

    filter = next(
        {
            "globalFieldName": t["globalFieldName"],
        }
        for t in ws.getFilters()
        if t["column"] == columnName
    )

    payload = (
        ("dashboard", scraper.dashboard),
        ("globalFieldName", (None, filter["globalFieldName"])),
        ("qualifiedFieldCaption", (None, columnName)),
        ("membershipTarget", (None, "filter")),
        ("exclude", (None, "false")),
        ("filterValues", (None, json.dumps(values))),
        ("filterUpdateType", (None, "filter-replace"))
    )
    try:
        r = scraper.session.post(
            f'{scraper.host}{scraper.tableauData["vizql_root"]}/sessions/{scraper.tableauData["sessionid"]}/commands/tabdoc/dashboard-categorical-filter',
            files=payload,
            verify=scraper.verify
        )
        scraper.lastActionTime = time.time()

        if r.status_code >= 400:
            raise requests.exceptions.RequestException(r.content)
        resp = r.json()
        errors = [
            res['commandReturn']['commandValidationPresModel']['errorMessage']
            for res in resp['vqlCmdResponse']['cmdResultList']
            if not res['commandReturn'].get('commandValidationPresModel', {}).get('valid', True)
        ]
        if errors:
            wb._scraper.logger.error(str(", ".join(errors)))
            raise tableauscraper.api.APIResponseException(", ".join(errors))

        wb.updateFullData(resp)
        return tableauscraper.dashboard.getWorksheetsCmdResponse(scraper, resp)
    except ValueError as e:
        scraper.logger.error(str(e))
        return tableauscraper.TableauWorkbook(
            scraper=scraper, originalData={}, originalInfo={}, data=[]
        )
    except tableauscraper.api.APIResponseException as e:
        wb._scraper.logger.error(str(e))
        return tableauscraper.TableauWorkbook(
            scraper=scraper, originalData={}, originalInfo={}, data=[]
        )

similar code to setParameter but ignore if the workbook defined values

def force_setParameter(wb, parameterName, value):
    scraper = wb._scraper
    tableauscraper.api.delayExecution(scraper)
    payload = (
        ("fieldCaption", (None, parameterName)),
        ("valueString", (None, value)),
    )
    r = scraper.session.post(
        f'{scraper.host}{scraper.tableauData["vizql_root"]}/sessions/{scraper.tableauData["sessionid"]}/commands/tabdoc/set-parameter-value',
        files=payload,
        verify=scraper.verify
    )
    scraper.lastActionTime = time.time()
    if r.status_code >= 400:
        raise requests.exceptions.RequestException(r.content)
    resp = r.json()

    wb.updateFullData(resp)
    return tableauscraper.dashboard.getWorksheetsCmdResponse(scraper, resp)
bertrandmartel commented 2 years ago

In fact province and region are filters and can be retrieved using getFilters(). I've implemented the dashboard categorical filter API, but it seems the original setFilter also works. You can set the province, using setFilter, then set the date using setParameter

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/SATCOVIDDashboard/2-dash-tiles-province"

ts = TS()
ts.loads(url)
wb = ts.getWorkbook()

sheetName = 'D2_New'

ws = wb.getWorksheet(sheetName)
wb = ws.setFilter("province", "กรุงเทพมหานคร")
ws = wb.getWorksheet(sheetName)
print(ws.data)
# change date
wb = wb.setParameter('param_date', '2021-08-16')
ws = wb.getWorksheet(sheetName)
print(ws.data)

wb = wb.setParameter('param_date', '2021-08-15')
ws = wb.getWorksheet(sheetName)
print(ws.data)

https://replit.com/@bertrandmartel/TableauCovidDDCMOPH

You can use the dashboard-categorical-filter API using:

ws.setFilter("province", "กรุงเทพมหานคร", dashboardFilter=True)

but I didn't found any difference between both call in this case

djay commented 1 year ago

@bertrandmartel in the particular case we are using it for they don't have the filter on any of the worksheets but you can still send this filter as a dashboardFilter and it works. Maybe there should be a concept of a workbook.setFilter?