MobilityData / mobility-feed-api

Apache License 2.0
8 stars 3 forks source link

poc: recognizing when a dataset was unfetchable #515

Open cka-y opened 3 days ago

cka-y commented 3 days ago

Summary: This PR is a proof of concept (POC) to identify when a dataset was not fetched, providing clearer information to the user about when and why a dataset was never fetched.

Expected Behavior: Currently, the batch processing job attempts to fetch all non-deprecated feeds. Known reasons for a non-deprecated feed not being successfully fetched include:

  1. Internal server errors (e.g., HTTP connection timeout because the URL is inaccessible).
  2. Invalid ZIP file: If the dataset available at the producer's URL is not a valid ZIP file (e.g., the URL redirects to an HTML page), processing is skipped.

The goal of this POC is to enhance the UI by providing more information when a feed has no datasets. To achieve this, the existing dataset_trace GCP Datastore entity has been updated with an INVALID_ZIP status type. The status types are now as follows:

By having this status in Datastore, along with a timestamp and a stable_id corresponding to the feed's stable_id, it is possible to retrieve the latest status of retrieval for a given feed. Consequently, an element has been added to the API's GTFSFeed response to include information about the last fetch attempt. Here is an example of the object:

"last_fetch_attempt": {
        "status": "PUBLISHED",
        "timestamp": "2024-06-19T17:40:50.188550"
    }

This information is now displayed on the UI's feed pages to show the latest status. Here are a few examples for different types of statuses (note that the results may vary if the API development is rebuilt or the batch processing jobs are run using the preview link): image

Screenshot 2024-06-28 at 2 10 49 PM

Please make sure these boxes are checked before submitting your pull request - thanks!

github-actions[bot] commented 2 days ago

Preview Firebase Hosting URL: https://mobility-feeds-dev--pr-515-2zlcv5ne.web.app