bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
MIT License
131 stars 21 forks source link

data coming back different than selected item #6

Closed az-data-guru closed 3 years ago

az-data-guru commented 3 years ago

Hi,

I am using this library to scrape a state tableau site for COVID-19 vaccine data. My goal is to eventually obtain all of the relevant county level data. However, when I select a county a different county's data comes back, with one county "Santa Cruz" not selecting anything. This may be an issue with the worksheet.

If it is an issue with the worksheet itself, is there any workaround I can use this to select the county via the map.

Thank you!

from tableauscraper import TableauScraper as TS
import pandas as pd
import time

counties = ["Apache", "Coconino", "Cochise", "Graham", "Greenlee", "Gila", "La Paz", "Maricopa", "Mohave", "Navajo", "Pinal", "Pima",  "Yavapai", "Yuma"]

for county in counties:
    print(county)

    while True:
        try:
            data_list = []

            url = "https://tableau.azdhs.gov/views/VaccineDashboard/Vaccineadministrationdata?%3Aembed=y&"

            #initialize scraper

            ts = TS()
            ts.loads(url)

            #select that value
            dashboard = ts.getWorksheet(" County using admin county").select("Admin Address County W/ State Pod", county)

            for t in dashboard.worksheets:
                data_list.append(t.data)

            res = [int(i) for i in str(data_list[0]).split() if i.isdigit()]
            one_dose = res[1]
            print(one_dose)
            print(data_list[0])  
        except:
            continue
        break

    time.sleep(30)
bertrandmartel commented 3 years ago

@az-data-guru Do you use version v0.0.8 ? I've just released it today, it fixes some bug with select and dropDown

az-data-guru commented 3 years ago

I don't. I will update now and let you know. Thanks!

az-data-guru commented 3 years ago

It is still showing the wrong county.

bertrandmartel commented 3 years ago

@az-data-guru I see, there is a bug related to the index of the values in the array. In all other tableau dashboard I've seen, the values were indexed from 1. But it seems in this case it starts from 2. With Apache being the last with value 16 while there are only 15 values.

Also you can move ts = TS() outside of the for loop (in order not to load a new session each time) and there is already a delay in the lib of 500 ms between select/dropdown calls. It can be changed with ts = TS(delayMs=500)

from tableauscraper import TableauScraper as TS

counties = ["Apache", "Coconino"]

url = "https://tableau.azdhs.gov/views/VaccineDashboard/Vaccineadministrationdata"

ts = TS()
ts.loads(url)
worksheet = ts.getWorksheet(" County using admin county")

values = ts.getWorksheet(" County using admin county").getValues("Admin Address County W/ State Pod")
print(values)

for county in counties:
    print(county)
    dashboard = worksheet.select("Admin Address County W/ State Pod", county)
    print(dashboard.getWorksheet("Number of People").data)

I will look at the raw data to see how to get that starting index issue

bertrandmartel commented 3 years ago

It seems there is an offset between the index values, it starts from 1 but from the 3rd one, the index is shifted :

"Yuma" 1 "Yavapai" 2 "Santa Cruz" 4 "Pinal" 5

az-data-guru commented 3 years ago

Thank you for catching that. This is something I would need to inform the tableau developer in order for it to work right?

Thank you!

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Bertrand Martel @.> Sent: Friday, March 19, 2021 6:29:54 PM To: bertrandmartel/tableau-scraping @.> Cc: Archer, Garrett @.>; Mention @.> Subject: Re: [bertrandmartel/tableau-scraping] data coming back different than selected item (#6)

[EXTERNAL SENDER]

It seems there is an offset between the index values, it starts from 1 but from the 3rd one, the index is shifted :

"Yuma" 1 "Yavapai" 2 "Santa Cruz" 4 "Pinal" 5

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bertrandmartel_tableau-2Dscraping_issues_6-23issuecomment-2D803216825&d=DwMCaQ&c=aLv4kG3eFBuAUFgZFQ07JQ&r=Ja4O_iWklFEbvDJXGlgTxtHuwEfIy-pTxRJxupOQHIw&m=yYgQ2xaFq6SJP6MQGae1U4R-NKhfd2gMaOiGQhYdiqc&s=xNDlMZdaiVhqkNkdQ4SWb4VyxnJRF-X2RWtzO__VWO8&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_APBUTG5XHEESIELQJBJS3B3TEP3BFANCNFSM4ZPYPUAQ&d=DwMCaQ&c=aLv4kG3eFBuAUFgZFQ07JQ&r=Ja4O_iWklFEbvDJXGlgTxtHuwEfIy-pTxRJxupOQHIw&m=yYgQ2xaFq6SJP6MQGae1U4R-NKhfd2gMaOiGQhYdiqc&s=FB3pXrvAQSI94tKM7JmRdPAcUzdtIv_ovhAb95QJtr8&e=.

Scripps Media, Inc., certifies that its advertising sales agreements do not discriminate on the basis of race or ethnicity. All advertising sales agreements contain nondiscrimination clauses.

bertrandmartel commented 3 years ago

@az-data-guru I don't know. This is the first time I notice an offset in the index of selection, there must be something in the data relating to those index. But maybe I'm missing something. It seems that there is no problem with the data since the tableau website can successfully get the data. Just that the Tableau website knows that SantaCruz is index 4 (an not 3) whereas I have no idea where this info is located in the json API result.

I will continue to investigate tomorrow

az-data-guru commented 3 years ago

Thank you!

From: Bertrand Martel @.> Sent: Friday, March 19, 2021 7:21 PM To: bertrandmartel/tableau-scraping @.> Cc: Archer, Garrett @.>; Mention @.> Subject: Re: [bertrandmartel/tableau-scraping] data coming back different than selected item (#6)

[EXTERNAL SENDER]

@az-data-guru [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_az-2Ddata-2Dguru&d=DwMCaQ&c=aLv4kG3eFBuAUFgZFQ07JQ&r=Ja4O_iWklFEbvDJXGlgTxtHuwEfIy-pTxRJxupOQHIw&m=WAW6anQKfyri2997uSW_mBm12xhbjIMuPdPdQ0rQsGs&s=0srax_o3y1GoZ_MbNyADX8C-5urInMFe8qh7NP2YETY&e= I don't know. This is the first time I notice an offset in the index of selection, there must be something in the data relating to those index. But maybe I'm missing something. It seems that there is no problem with the data since the tableau website can successfully get the data. Just that the Tableau website knows that SantaCruz is index 4 (an not 3) whereas I have no idea where this info is located in the json API result.

I will continue to investigate tomorrow

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bertrandmartel_tableau-2Dscraping_issues_6-23issuecomment-2D803228295&d=DwMCaQ&c=aLv4kG3eFBuAUFgZFQ07JQ&r=Ja4O_iWklFEbvDJXGlgTxtHuwEfIy-pTxRJxupOQHIw&m=WAW6anQKfyri2997uSW_mBm12xhbjIMuPdPdQ0rQsGs&s=Xhypxy9M6pgKUKRmkuJqki5mBCo7GXS0zWiAkiMeK98&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_APBUTG6TODV5HHCURDSEAALTEQA7TANCNFSM4ZPYPUAQ&d=DwMCaQ&c=aLv4kG3eFBuAUFgZFQ07JQ&r=Ja4O_iWklFEbvDJXGlgTxtHuwEfIy-pTxRJxupOQHIw&m=WAW6anQKfyri2997uSW_mBm12xhbjIMuPdPdQ0rQsGs&s=7Rh6HYPyx4CeWHaGjVpjoUbHXXwwJCgmMLJyycBUjP8&e=.

Scripps Media, Inc., certifies that its advertising sales agreements do not discriminate on the basis of race or ethnicity. All advertising sales agreements contain nondiscrimination clauses.

bertrandmartel commented 3 years ago

@az-data-guru I've identified that there is a column [system:visual].[tuple_id] which points to:

{
    "tupleIds": [
        1,
        2,
        4,
        5,
        6,
        7,
        8,
        9,
        10,
        11,
        12,
        13,
        14,
        15,
        16
    ],
    "valueIndices": [],
    "aliasIndices": [],
    "formatstrIndices": []
},

This seems to be the way the index is built (called tupleId)

bertrandmartel commented 3 years ago

There is another bug related to #5. I need to look further into records of data dictionary chunks between calls

bertrandmartel commented 3 years ago

This is now working in v0.0.9

The following get all the counties:

https://replit.com/@bertrandmartel/TableauCovidArizona

from tableauscraper import TableauScraper as TS

url = "https://tableau.azdhs.gov/views/VaccineDashboard/Vaccineadministrationdata"

ts = TS()
ts.loads(url)
worksheet = ts.getWorksheet(" County using admin county")

counties = ts.getWorksheet(" County using admin county").getValues(
    "Admin Address County W/ State Pod")

for county in counties:
    print(county)
    dashboard = worksheet.select("Admin Address County W/ State Pod", county)
    print(dashboard.getWorksheet("Number of People").data)
az-data-guru commented 3 years ago

Thank you. It works now.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Bertrand Martel @.> Sent: Saturday, March 20, 2021 8:08:29 AM To: bertrandmartel/tableau-scraping @.> Cc: Archer, Garrett @.>; Mention @.> Subject: Re: [bertrandmartel/tableau-scraping] data coming back different than selected item (#6)

[EXTERNAL SENDER]

Closed #6 [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bertrandmartel_tableau-2Dscraping_issues_6&d=DwMCaQ&c=aLv4kG3eFBuAUFgZFQ07JQ&r=Ja4O_iWklFEbvDJXGlgTxtHuwEfIy-pTxRJxupOQHIw&m=n-0YBpsfr4goSqe9Za00WyD8LA9fo_4vWsBu0gY2QjY&s=emEK3a9ZS_Jt49D1w-YkEqUyihdeo4jYVBDOQtR2Hoc&e=.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bertrandmartel_tableau-2Dscraping_issues_6-23event-2D4485698617&d=DwMCaQ&c=aLv4kG3eFBuAUFgZFQ07JQ&r=Ja4O_iWklFEbvDJXGlgTxtHuwEfIy-pTxRJxupOQHIw&m=n-0YBpsfr4goSqe9Za00WyD8LA9fo_4vWsBu0gY2QjY&s=TKfNXO8i3NlE21M36wlC-3lbXc2dD_9F4X6zdJRy70k&e=, or unsubscribe [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_APBUTG7BEHNWVI4IF72KBFLTES263ANCNFSM4ZPYPUAQ&d=DwMCaQ&c=aLv4kG3eFBuAUFgZFQ07JQ&r=Ja4O_iWklFEbvDJXGlgTxtHuwEfIy-pTxRJxupOQHIw&m=n-0YBpsfr4goSqe9Za00WyD8LA9fo_4vWsBu0gY2QjY&s=rI1eErkc5BzC4YbaMhtENN0JGE_Uq4J6kmErbiPwNPA&e=.

Scripps Media, Inc., certifies that its advertising sales agreements do not discriminate on the basis of race or ethnicity. All advertising sales agreements contain nondiscrimination clauses.

bertrandmartel commented 3 years ago

@az-data-guru Hello, there have been some improvement on the way the tuple index is used. If you want, you can move to v0.1.0, this would be

from tableauscraper import TableauScraper as TS

url = "https://tableau.azdhs.gov/views/VaccineDashboard/Vaccineadministrationdata"

ts = TS()
ts.loads(url)
ws = ts.getWorksheet(" County using admin county")

counties = [
  t["values"]
  for t in ws.getSelectableItems()
  if t["column"] == "Admin Address County W/ State Pod"
][0]

for county in counties:
    print(county)
    wb = ws.select("Admin Address County W/ State Pod", county)
    print(wb.getWorksheet("Number of People").data)