Open stepa8 opened 2 years ago
I was running into a similar problem and this issue sent me in the right direction.
https://github.com/bertrandmartel/tableau-scraping/issues/30
It seems like there is a URL other than the public facing URL . You have to open chrome tools and the network tab find the url that starts with https://public.tableau.com/views....
I tried looking up the one you were interested in and couldn't find the exact tableau worksheet, but the only one published by epidemiology.immunization.services.branch was this one https://public.tableau.com/app/profile/epidemiology.immunization.services.branch/viz/COVID-19DemographicsTEST_16498711218660/DailyCounts
And if you look in the network tab when it was loading, this URL popped up
https://public.tableau.com/views/COVID-19DemographicsTEST_16498711218660/DailyCounts
Which I just did a quick test and this URL seems to work. Someone else more knowledgeable might be able to explain the difference between the two URLs. But it might be helpful to put something in the documentation that the public facing URL is not exactly the URL needed to make this work
Hello, thank you for this amazing library.
I am facing a similar issue. I found the public.tableau.com/views
url but is returning an empty DataFrame.
Here is the url: 'https://public.tableau.com/views/DB_FISCA_01/Fisca_DS_RankingPeliculas'
I tried going through the source code and the thing is that data['secondaryInfo']
is empty.
Here is my code, which I took from here:
import requests
from bs4 import BeautifulSoup
import json
import re
url = "https://public.tableau.com/views/DB_FISCA_01/Fisca_DS_RankingPeliculas"
r = requests.get(
url,
params= {
":display_static_image":"y",
":bootstrapWhenNotified":"true",
":embed":"true",
":language":"es-ES",
":embed":"y",
":showVizHome":"n",
":apiID":"host0"
}
)
soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)
dataUrl = f'https://public.tableau.com{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'
r = requests.post(dataUrl, data= {
"sheet_id": tableauData["sheetId"],
})
dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))
And then print(data)
returns {'secondaryInfo': {}}
Ran this on WSL on Windows 10 which is a flavor of ubuntu.
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/app/profile/epidemiology.immunization.services.branch/viz/COVID-19DailyHighlights/DailyHighlights" ts = TS() ts.loads(url)
Then, we see this error: python scrape_tableau.py Traceback (most recent call last): File "scrape_tableau.py", line 9, in
ts.loads(url)
File "/mnt/c/Users/stepa8/Projects/tableau-scraping/tab-env/lib/python3.8/site-packages/tableauscraper/TableauScraper.py", line 80, in loads
soup.find("textarea", {"id": "tsConfigContainer"}).text
AttributeError: 'NoneType' object has no attribute 'text'
It appears soup.find cannot find: "textarea", {"id": "tsConfigContainer"
Is there a workaround?