Percona-Lab / ontime-airline-performance

97 stars 45 forks source link

transtats.bts.gov sends 404 #2

Open Mottl opened 5 years ago

Mottl commented 5 years ago

Seems like transtats.bts.gov has removed PREZIPed reports. It responds with 404 error.

vadimtk commented 5 years ago

it seems they moved files to https://www.bts.gov/browse-statistical-products-and-data/bts-publications/airline-service-quality-performance-234-time

Mottl commented 5 years ago

Only since 2005 though

vadimtk commented 5 years ago

They say "Data are available since 1987.

NOTE: If you are interested in any aviation data not listed below or if you have any questions, please email answers@dot.gov"

hum1 commented 5 years ago

https://www.bts.gov/sites/bts.dot.gov/files/docs/legacy/additional-attachment-files/ONTIME.TD.201508.REL02.01OCT2015.zip

download url changed.

dbackeus commented 9 months ago

Alternative download script:

#!/bin/bash

html_content=$(curl -s https://www.bts.gov/browse-statistical-products-and-data/bts-publications/airline-service-quality-performance-234-time)

# Use grep and sed to extract .zip links
zip_links=$(echo "$html_content" | grep -o 'href=["'"'"'][^"'"'"']*\.zip' | sed 's/href=["'"'"']//')

base_url="https://www.bts.gov"

for path in $zip_links; do
    # If the link is relative, prepend the base URL
    url="$base_url$path"
    echo "Downloading $url"
    wget -U "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "$url"
done

for file in *.zip; do
    unzip "$file"
done

The format of these files appear wildly different from what the data load scripts expect though...