MaxHalford / bike-sharing-history

🚲 Git scraping for bike sharing APIs
16 stars 5 forks source link

.parquet vides ou weather dans le bucket GCP des station-status #2

Closed AntoineGiraud closed 7 months ago

AntoineGiraud commented 7 months ago

Il y a, dans le bucket (bike-sharing-history)[https://console.cloud.google.com/storage/browser/bike-sharing-history] 58 .parquet vides & des fichiers ./weather/ empêchant de pouvoir charger TOUTES les villes dans DuckDB

SET s3_endpoint='storage.googleapis.com';
select * from READ_PARQUET('s3://bike-sharing-history/**/*.parquet', filename = true)

ex : bordeaux/vcub/2024/Jan.parquet, vannes/veloceo/2024/Jan.parquet

requête pour en avoir le cœur net :

select *
from parquet_metadata('s3://bike-sharing-history/**/*.parquet')
where column_id = 0 and (path_in_schema != 'station' or num_values <= 1);
MaxHalford commented 7 months ago

All good now!