GoogleCloudPlatform / data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Apache License 2.0
1.31k stars 715 forks source link

Chapter 2: Getting 500 - Internal Server Error when Running ingest_flights.py #119

Closed dylanmpeck closed 2 years ago

dylanmpeck commented 3 years ago

Screenshot of results from running ingest_flights.py:

command2

It looks like the link used to request the data in ingest.py may be broken: 'https://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=236&Has_Group=3&Is_Zipped=0'.

Receiving a 500 error when trying to access that link.

Is there an alternative link from that site that could be used?

lakshmanok commented 3 years ago

Skip this step, and get the data from the bucket. The README files have the details.

On Thu, Aug 5, 2021, 4:42 PM Dylan Peck @.***> wrote:

Screenshot of results from running ingest.py:

[image: command2] https://user-images.githubusercontent.com/40506467/128434629-763ce84c-04d1-40b6-9749-3fef88029b7f.png

It looks like the link used to request the data in ingest.py may be broken: ' https://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=236&Has_Group=3&Is_Zipped=0 '.

Receiving a 500 error when trying to access that link.

Is there an alternative link from that site that could be used?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZY534IEFCYN5CMDTBDT3MOVJANCNFSM5BU2QQCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

dylanmpeck commented 3 years ago

Thanks for the quick response, Lak.

That definitely solves the issue when testing locally, but then the rest of the Qwiklab on this chapter involves the optional content of making a Cloud Function with the python code and setting it to an automated schedule. Following the CF setup instructions from the readme, I'm encountering the same 500 error from ingest_flights.py when testing the cloud function.

Screen Shot 2021-08-05 at 5 35 02 PM
lakshmanok commented 3 years ago

looks like the URL has changed a bit. the new download URL seems to be:

https://www.transtats.bts.gov/DownLoad_Table.asp?gnoyr_Vq=FGJ&Un5_T4172=G&V5_mv22rq=D

and it's a POST that contains the variables requested, for example:

UserTableName: Reporting_Carrier_On_Time_Performance_1987_present
DBShortName: On_Time
RawDataTable: T_ONTIME_REPORTING
sqlstr: IFNFTEVDVCBZRUFSLFFVQVJURVIsTU9OVEgsREFZX09GX01PTlRILEZMX0RBVEUsT1BfVU5JUVVFX0NBUlJJRVIsT1BfQ0FSUklFUl9GTF9OVU0sT1JJR0lOX0FJUlBPUlRfSUQsT1JJR0lOX0FJUlBPUlRfU0VRX0lELE9SSUdJTl9DSVRZX01BUktFVF9JRCxPUklHSU4sT1JJR0lOX1NUQVRFX0FCUixPUklHSU5fU1RBVEVfTk0sREVTVF9BSVJQT1JUX0lELERFU1RfQUlSUE9SVF9TRVFfSUQsREVTVF9DSVRZX01BUktFVF9JRCxUQVhJX0lOIEZST00gIFRfT05USU1FX1JFUE9SVElORyBXSEVSRSBNb250aCA9MSBBTkQgWUVBUj0yMDIx
varlist: YEAR,QUARTER,MONTH,DAY_OF_MONTH,FL_DATE,OP_UNIQUE_CARRIER,OP_CARRIER_FL_NUM,ORIGIN_AIRPORT_ID,ORIGIN_AIRPORT_SEQ_ID,ORIGIN_CITY_MARKET_ID,ORIGIN,ORIGIN_STATE_ABR,ORIGIN_STATE_NM,DEST_AIRPORT_ID,DEST_AIRPORT_SEQ_ID,DEST_CITY_MARKET_ID,TAXI_IN

We'll look into changing the downloader to reflect this change.

dylanmpeck commented 3 years ago

Thanks, Lak!

When do you think this change might make it into the repo?

lakshmanok commented 3 years ago

I'm updating the code to use the new link, but the resulting CSV file has a different structure. So, this changes the processing etc. You can watch the progress in the branch "edition2". Since this is an optional exercise anyway, I'd suggest just using the data in the cloud bucket (as suggested in the README.md) and moving on to the next chapter.