Closed az-data-guru closed 3 years ago
@az-data-guru You could format the data like this to get the percentage value:
from tableauscraper import TableauScraper as TS
import pandas as pd
url = "https://tableau.azdhs.gov/views/EMResourceBeds/InpatientBedUsageAvailability"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
for t in workbook.worksheets:
df = t.data
if "SUM(Number)-alias" in df:
df['SUM(Number)-alias'] = pd.Series([
"{0:.2f}%".format(val * 100)
for val in df['SUM(Number)-alias']
], index=df.index)
df['SUM(Number)-value'] = pd.Series([
"{0:.2f}%".format(val * 100)
for val in df['SUM(Number)-value']
], index=df.index)
print(df)
which gives:
Date Pulled-value Date Pulled-alias Med Surgbed-alias SUM(Number)-alias SUM(Number)-value
0 2021-07-18 7/18/2021 Inpatient Beds In Use by COVID Patients 9.40% 9.40%
1 2021-07-17 7/17/2021 Inpatient Beds In Use by COVID Patients 8.95% 8.95%
2 2021-07-16 7/16/2021 Inpatient Beds In Use by COVID Patients 8.60% 8.60%
3 2021-07-15 7/15/2021 Inpatient Beds In Use by COVID Patients 8.29% 8.29%
4 2021-07-14 7/14/2021 Inpatient Beds In Use by COVID Patients 7.97% 7.97%
... ... ... ... ... ...
1419 2020-03-30 3/30/2020 Inpatient Beds Available 33.90% 33.90%
1420 2020-03-29 3/29/2020 Inpatient Beds Available 35.58% 35.58%
1421 2020-03-28 3/28/2020 Inpatient Beds Available 33.13% 33.13%
1422 2020-03-27 3/27/2020 Inpatient Beds Available 32.07% 32.07%
1423 2020-03-26 3/26/2020 Inpatient Beds Available 32.39% 32.39%
In fact, the integer values are calculated in Javascript, the real values are the raw data retrieved from the API. In some dashboard, the worksheet also contains the integer value but it depends on the dashboard
However, the format is specified in the json data but it's not straightforward to parse:
"vizDataColumns": [{
"fn": "[system:visual].[tuple_id]",
"fnDisagg": "",
"formatStrings": [],
"isAutoSelect": false,
"paneIndices": [0],
"columnIndices": [0]
}, {
"fn": "[federated.1w7kli510gv45110nyi201m2xlpg].[none:DatePulled:qk]",
"fnDisagg": "",
"localBaseColumnName": "[DatePulled]",
"baseColumnName": "[federated.1w7kli510gv45110nyi201m2xlpg].[DatePulled]",
"fieldCaption": "Date Pulled",
"formatStrings": ["*dd/MM/yyyy", "fr_FR", "", "HH:mm:ss", "0", "1", "4", "0"],
"datasourceCaption": "federated.1w7kli510gv45110nyi201m2xlpg",
"dataType": "date",
"fieldRole": "dimension",
"isAutoSelect": false,
"paneIndices": [0],
"columnIndices": [1]
}, {
"fn": "[federated.1w7kli510gv45110nyi201m2xlpg].[none:MED_SURGBED:nk]",
"fnDisagg": "",
"localBaseColumnName": "[MED_SURGBED]",
"baseColumnName": "[federated.1w7kli510gv45110nyi201m2xlpg].[MED_SURGBED]",
"fieldCaption": "Med Surgbed",
"formatStrings": [],
"datasourceCaption": "federated.1w7kli510gv45110nyi201m2xlpg",
"dataType": "cstring",
"stringCollation": {
"name": "LEN_RUS_S2",
"charsetId": 0
},
"fieldRole": "dimension",
"isAutoSelect": true,
"paneIndices": [0],
"columnIndices": [2]
}, {
"fn": "[federated.1w7kli510gv45110nyi201m2xlpg].[sum:Number:qk]",
"fnDisagg": "",
"localBaseColumnName": "[Number]",
"baseColumnName": "[federated.1w7kli510gv45110nyi201m2xlpg].[Number]",
"fieldCaption": "SOMME(Number)",
"formatStrings": ["", "-1", "t", "", "", "", "0", ",", " ", "3;0"],
"datasourceCaption": "federated.1w7kli510gv45110nyi201m2xlpg",
"dataType": "integer",
"aggregation": "sum",
"fieldRole": "measure",
"isAutoSelect": false,
"paneIndices": [0],
"columnIndices": [3]
}, {
"fn": "[federated.1w7kli510gv45110nyi201m2xlpg].[pcto:sum:Number:qk:2]",
"fnDisagg": "",
"localBaseColumnName": "[Number]",
"baseColumnName": "[federated.1w7kli510gv45110nyi201m2xlpg].[Number]",
"fieldCaption": "SOMME(Number)",
"formatStrings": ["p", "0", "f", "", "", "", "2", ",", " ", "3;0"],
"datasourceCaption": "federated.1w7kli510gv45110nyi201m2xlpg",
"dataType": "real",
"aggregation": "sum",
"fieldRole": "measure",
"isAutoSelect": false,
"paneIndices": [0],
"columnIndices": [4]
}]
I think it would need to understand how the formatStrings
is interpreted in Javascript:
"formatStrings": ["", "-1", "t", "", "", "", "0", ",", " ", "3;0"]
But if the code above is returning correct % value as expected, it may be sufficient for your needs
I may not have been clear, instead of the percentage of inpatient I need the actual bed count. It exists in the data_dictionary.
For example, the 2021-07-18 entry in the dictionary has an integer value of 814 and a real value of .09404. I need the 814.
@az-data-guru this is fixed in v0.1.18. There was multiple columns with the same caption text. Thus, when the column name already exists, I've appended the function name to it.
from tableauscraper import TableauScraper as TS
url = "https://tableau.azdhs.gov/views/EMResourceBeds/InpatientBedUsageAvailability"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
for t in workbook.worksheets:
print(t.name)
print(t.data)
Output
Date Pulled-value Date Pulled-alias Med Surgbed-alias SUM(Number)-alias SUM(Number)-value SUM(Number)-[federated.1w7kli510gv45110nyi201m2xlpg].[pcto:sum:Number:qk:2]-alias
0 2021-07-18 7/18/2021 Inpatient Beds In Use by COVID Patients 814 0.0940497 0.0940497
1 2021-07-17 7/17/2021 Inpatient Beds In Use by COVID Patients 775 0.0895023 0.0895023
2 2021-07-16 7/16/2021 Inpatient Beds In Use by COVID Patients 745 0.0860277 0.0860277
3 2021-07-15 7/15/2021 Inpatient Beds In Use by COVID Patients 719 0.0829201 0.0829201
4 2021-07-14 7/14/2021 Inpatient Beds In Use by COVID Patients 689 0.0796716 0.0796716
.. ... ... ... ... ... ...
232 2021-05-05 5/5/2021 Inpatient Beds Available 915 0.106457 0.106457
233 2021-05-04 5/4/2021 Inpatient Beds Available 1061 0.123473 0.123473
234 2021-05-03 5/3/2021 Inpatient Beds Available 1120 0.130354 0.130354
235 2021-05-02 5/2/2021 Inpatient Beds Available 1093 0.127464 0.127464
236 2021-05-01 5/1/2021 Inpatient Beds Available 1034 0.120471 0.120471
[237 rows x 6 columns]
Thank you!
Considering this code block: `from tableauscraper import TableauScraper as TS import pandas as pd import time
data_list = [] url = "https://tableau.azdhs.gov/views/EMResourceBeds/InpatientBedUsageAvailability?%3Aembed=y&"
ts = TS() ts.loads(url) workbook = ts.getWorkbook()
for t in workbook.worksheets: data_list.append(t.data) print(data_list)`
TableauScraper is only offering the real (percentage in this case) number. How can I get the integer instead?
Thanks as always BM