MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
266 stars 100 forks source link

feat: Column-based storage for GTFS entities #1747

Open bdferris-v2 opened 1 month ago

bdferris-v2 commented 1 month ago

Per discussion in #1358 and GTFS Validator - Memory Reduction, this PR implements support for column-based storage of GTFS entities. This technique supports reduction in the validators memory footprint by avoiding the memory usage of unused columns.

This PR is not yet ready for review but is meant to show what the implementation might look like.

See the implementation report for details on memory savings and performance.

Please make sure these boxes are checked before submitting your pull request - thanks!

CLAassistant commented 1 month ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

github-actions[bot] commented 1 month ago

✅ Rule acceptance tests passed. New Errors: 1 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%. Dropped Errors: 2 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%. New Warnings: 1 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%. Dropped Warnings: 0 out of 1520 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%. 0 out of 1520 sources (~0 %) are corrupted. Commit: 337aa15e14f5b5af4d1877fa037a5133cdec7930 Download the full acceptance test report here (report will disappear after 90 days). ✅ Rule acceptance tests passed.

jcpitre commented 1 month ago

Impressive work. In general I am a bit concerned with the added complexity vs memory savings.