MobilityData / gbfs-validator

The canonical GBFS validator. Maintained by the GBFS community, facilitated by MobilityData.
https://gbfs-validator.mobilitydata.org/
Apache License 2.0
18 stars 12 forks source link

Scrapping from python results of GBFS-validator #165

Open iaguerri opened 7 months ago

iaguerri commented 7 months ago

If you are new to the GBFS Validator, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!

I'm working in a MaaS application. I need to validate the GBFS that the public operators gives to me.

What is the issue and why is it an issue?

I'm trying to do a request from python to the result of a validation (https://gbfs-validator.mobilitydata.org/validator?url=https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json) I'm trying from POSTMAN

The problem is that the response is a 200 (OK) but the info is not possible to extract (even with scrapping) because the body says "We're sorry but my-project doesn't work properly without Javascript enabled. Please enable to continue"

The code used:

import requests
from bs4 import BeautifulSoup

url_validator = "[https://gbfs-validator.mobilitydata.org/validator"](https://gbfs-validator.mobilitydata.org/validator%22)

# Jsons de prueba
json_main_full_brusels = "[https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json"](https://gbfs.api.ridedott.com/public/v2/brussels/gbfs.json%22)                                               # Json Correcto
json_main_nolastupdated_brusels = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoLastUpdated.json%22)                 # Json Incorrecto (No last Updated)
json_main_vehiclyType_nolastupdated = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasVehiclyTypeCorrupted.json%22)      # Json Incorrecto - feed VehicleTypes sin lastUpdated
json_main_nofeed_systeminformation = "[https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json"](https://github.com/Almanes/GtfsFiles/raw/main/pruebasBruselasNoSysteminformationfeed.json%22)    # Json Incorrecto - No feed SystemInformation

params = {
    "url": json_main_nolastupdated_brusels
}

url_completa = requests.Request('GET', url_validator, params=params).prepare().url
print("URL de la solicitud:", url_completa)

#APPROACH 1: access from the request
respuesta = requests.get(url_validator, params=params)

if respuesta.status_code == 200:
     datos_respuesta = respuesta.text
     print("Respuesta del Validador:", datos_respuesta)
else:
     print("Error en la solicitud. Código de estado:", respuesta.status_code)
     print("Contenido de la respuesta:", respuesta.text)`

#APPROACH 2: with selenium
soup = BeautifulSoup(respuesta.content, 'html.parser')

for div_element in soup.find_all('div', class_='data-v-7c2075bd'):
    # Extract the text content of the div element
    div_text = div_element.get_text(strip=True)

    # Print the value of k
    print("Valor de k es:", div_text)

image

image

Please describe some potential solutions you have considered (even if they aren’t related to GBFS).

I don't know why the html is not loaded after, but maybe activating Javascript it would be nicer to get this info

Thanks!!

davidgamez commented 7 months ago

Hi @iaguerri, the GBFS Validator is currently deployed on Netlify. Looking at the error message you are getting, Netlify is detecting and blocking the use of a bot consumer. You can browse the Internet for solutions on how to avoid user-agent detection. However, I suggest using the "not documented/no stable" API endpoint if you want to get the validation report response for specific feeds. Unfortunately, we are not offering a stable API endpoint yet. The following issue contains information on how to access the API https://github.com/MobilityData/gbfs-validator/issues/95. If you would like to follow the development of the stable API, follow this issue https://github.com/MobilityData/gbfs-validator/issues/129.