Summary:
This PR is a proof of concept (POC) to identify when a dataset was not fetched, providing clearer information to the user about when and why a dataset was never fetched.
Expected Behavior:
Currently, the batch processing job attempts to fetch all non-deprecated feeds. Known reasons for a non-deprecated feed not being successfully fetched include:
Internal server errors (e.g., HTTP connection timeout because the URL is inaccessible).
Invalid ZIP file: If the dataset available at the producer's URL is not a valid ZIP file (e.g., the URL redirects to an HTML page), processing is skipped.
The goal of this POC is to enhance the UI by providing more information when a feed has no datasets. To achieve this, the existing dataset_trace GCP Datastore entity has been updated with an INVALID_ZIP status type. The status types are now as follows:
FETCHED: The dataset was successfully updated, stored in GCP, and added to the database.
NOT_FETCHED: There was no update; the latest stored dataset is still up to date.
FAILURE: An error occurred during processing (e.g., HTTP connection timeout due to an inaccessible producer URL).
INVALID_ZIP: The URL is accessible, but the response is not a valid ZIP file.
By having this status in Datastore, along with a timestamp and a stable_id corresponding to the feed's stable_id, it is possible to retrieve the latest status of retrieval for a given feed. Consequently, an element has been added to the API's GTFSFeed response to include information about the last fetch attempt. Here is an example of the object:
This information is now displayed on the UI's feed pages to show the latest status. Here are a few examples for different types of statuses (note that the results may vary if the API development is rebuilt or the batch processing jobs are run using the preview link):
Please make sure these boxes are checked before submitting your pull request - thanks!
[x] Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
[ ] Add or update any needed documentation to the repo
Summary: This PR is a proof of concept (POC) to identify when a dataset was not fetched, providing clearer information to the user about when and why a dataset was never fetched.
Expected Behavior: Currently, the batch processing job attempts to fetch all non-deprecated feeds. Known reasons for a non-deprecated feed not being successfully fetched include:
The goal of this POC is to enhance the UI by providing more information when a feed has no datasets. To achieve this, the existing
dataset_trace
GCP Datastore entity has been updated with anINVALID_ZIP
status type. The status types are now as follows:FETCHED
: The dataset was successfully updated, stored in GCP, and added to the database.NOT_FETCHED
: There was no update; the latest stored dataset is still up to date.FAILURE
: An error occurred during processing (e.g., HTTP connection timeout due to an inaccessible producer URL).INVALID_ZIP
: The URL is accessible, but the response is not a valid ZIP file.By having this status in Datastore, along with a
timestamp
and astable_id
corresponding to the feed'sstable_id
, it is possible to retrieve the latest status of retrieval for a given feed. Consequently, an element has been added to the API's GTFSFeed response to include information about the last fetch attempt. Here is an example of the object:This information is now displayed on the UI's feed pages to show the latest status. Here are a few examples for different types of statuses (note that the results may vary if the API development is rebuilt or the batch processing jobs are run using the preview link):![image](https://github.com/MobilityData/mobility-feed-api/assets/60586858/f66d6588-4b90-4a9d-9b28-2b4c49c0a306)
Please make sure these boxes are checked before submitting your pull request - thanks!
./scripts/api-tests.sh
to make sure you didn't break anything