soup.find fails to find Tableau data

bertrandmartel / tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz

MIT License

126 stars 20 forks source link

soup.find fails to find Tableau data #58

Open stepa8 opened 2 years ago

stepa8 commented 2 years ago

Ran this on WSL on Windows 10 which is a flavor of ubuntu.

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/app/profile/epidemiology.immunization.services.branch/viz/COVID-19DailyHighlights/DailyHighlights" ts = TS() ts.loads(url)

Then, we see this error: python scrape_tableau.py Traceback (most recent call last): File "scrape_tableau.py", line 9, in ts.loads(url) File "/mnt/c/Users/stepa8/Projects/tableau-scraping/tab-env/lib/python3.8/site-packages/tableauscraper/TableauScraper.py", line 80, in loads soup.find("textarea", {"id": "tsConfigContainer"}).text AttributeError: 'NoneType' object has no attribute 'text'

It appears soup.find cannot find: "textarea", {"id": "tsConfigContainer"

Is there a workaround?

xplreitr commented 2 years ago

I was running into a similar problem and this issue sent me in the right direction.

https://github.com/bertrandmartel/tableau-scraping/issues/30

It seems like there is a URL other than the public facing URL . You have to open chrome tools and the network tab find the url that starts with https://public.tableau.com/views....

I tried looking up the one you were interested in and couldn't find the exact tableau worksheet, but the only one published by epidemiology.immunization.services.branch was this one https://public.tableau.com/app/profile/epidemiology.immunization.services.branch/viz/COVID-19DemographicsTEST_16498711218660/DailyCounts

And if you look in the network tab when it was loading, this URL popped up

https://public.tableau.com/views/COVID-19DemographicsTEST_16498711218660/DailyCounts

Which I just did a quick test and this URL seems to work. Someone else more knowledgeable might be able to explain the difference between the two URLs. But it might be helpful to put something in the documentation that the public facing URL is not exactly the URL needed to make this work

martinolmos commented 8 months ago

Hello, thank you for this amazing library.

I am facing a similar issue. I found the public.tableau.com/views url but is returning an empty DataFrame. Here is the url: 'https://public.tableau.com/views/DB_FISCA_01/Fisca_DS_RankingPeliculas'

martinolmos commented 8 months ago

I tried going through the source code and the thing is that data['secondaryInfo'] is empty.

Here is my code, which I took from here:

import requests
from bs4 import BeautifulSoup
import json
import re

url = "https://public.tableau.com/views/DB_FISCA_01/Fisca_DS_RankingPeliculas"

r = requests.get(
    url,
    params= {
        ":display_static_image":"y",
        ":bootstrapWhenNotified":"true",
        ":embed":"true",
        ":language":"es-ES",
        ":embed":"y",
        ":showVizHome":"n",
        ":apiID":"host0"
    }
)

soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'https://public.tableau.com{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'

r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})

dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))

And then print(data) returns {'secondaryInfo': {}}