MobilityData / mobility-feed-api

Apache License 2.0
8 stars 3 forks source link

Improve performance of the gtfs latest datasets transformation #548

Open davidgamez opened 1 month ago

davidgamez commented 1 month ago

Description

The latest dataset transformation in the GTFS entities loops through all datasets to filter the one marked as the latest. This is a time bomb, as datasets will be added to the feeds over time, transformation code

Proposed solution

We need to revisit the way the latest datasets are mapped to the gtfs entity so that there is a single property pointing to the dataset without the need to loop over all datasets.

Alternative solution

If the proposed solution is not viable, we can modify the transformers and execute queries as part of the code to extract the latest dataset. This breaks the pattern of the transformer being agnostic of the DB queries but enhances the performance. Another alternative solution is passing a "context" parameter to the transformer containing the latest dataset query response.

emmambd commented 1 month ago

@davidgamez Time bomb = effective metaphor :)

How urgent is this? E.g next sprint vs. next week vs. next quarter?

davidgamez commented 1 month ago

@davidgamez Time bomb = effective metaphor :)

How urgent is this? E.g next sprint vs. next week vs. next quarter?

Maybe I was a little bit dramatic here :-) This can be done this quarter after release, not part of the release scope.