CUTR-at-USF / transit-feed-quality-calculator

A tool that uses the gtfs-realtime-validator to calculate the quality of a large number of GTFS-realtime feeds
Other
7 stars 1 forks source link

Add feed analysis in ResultsAnalyzer #2

Closed barbeau closed 6 years ago

barbeau commented 6 years ago

Currently, if you run the Main.main() method, the project will instantiate TransitFeedQualityCalculator and call TransitFeedQualityCalculator.calculate(), which will then:

  1. Download all GTFS-realtime feed files and their corresponding GTFS feeds (using the directory of feeds from TransitFeeds.com) into subfolders of a directory provided in Main.main() (currently is hard-coded to "feeds", which will be created on first execution). This happens in FeedDownloader.
  2. Validate each of the GTFS/GTFS-realtime feeds in the subfolders to produce xxxxxx.results.json files, one results file per protobuf file. This happens in BulkFeedValidator.
  3. Run the ResultsAnalyzer.analyzeResults(), which for each subfolder currently reads in each validation results JSON file and prints out the occurrence details for each occurrence of an error

@Suryakandukoori We need to transform ResultsAnalyzer.analyzeResults() into producing some type of analysis output based on the validation results, similar to the spreadsheets that we previously produced manually. Perhaps the easiest method is to output CSV results that we can open in spreadsheet software like Excel, and then copy/paste into Excel and leverage the graph create scripts we already have there?

If so, Jackson can output CSV - see https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv.

Also, note that validator currently chokes with an out of memory error on the "194-The Netherlands" feed - see https://github.com/CUTR-at-USF/transit-feed-quality-calculator/issues/1. Right now I've been running the download, deleting that folder, and then continuing with validation and analysis as a workaround.

I would think that we probably want to output the analysis files to the main "feeds" directory. Thoughts/ideas welcome!

barbeau commented 6 years ago

Here's the Excel spreadsheet we're currently using - GTFS-FEED-ERRORS-v3.xlsx