CUTR-at-USF / transit-feed-quality-calculator

A tool that uses the gtfs-realtime-validator to calculate the quality of a large number of GTFS-realtime feeds
Other
7 stars 1 forks source link

Errors are filtered from graphs when they aren't on the ignore list #16

Closed barbeau closed 6 years ago

barbeau commented 6 years ago

Summary:

We currently support filtering out specific errors or warnings from analysis. In TransitFeedQualityCalculator there is a comma-delimited string mErrorsToIgnore that contains a list of error IDs to ignore. These errors should not show up in the resulting graphs.xlsx file - for example, if I exclude E001, in the "Error Frequency" tab the graph of the "Most Frequent Errors and Warnings..." shouldn't include E001. So, if E001 is the most frequent error and E029 is the 2nd most frequent error, after filtering E001 error E029 should move to the top of the graph:

image

I was doing some testing for a new error in the validator and was trying to eliminate some of the other errors in the graph, and saw some strange behavior.

I started with filtering a single error:

private String mErrorsToIgnore = "E017";

...and got results I would expect - E017 was eliminated from the graph.

However, as I started adding more errors to filter out, the errors I was looking for (in my case E100, E101, and E102) were also eliminated from the graph, even though they weren't included in the ignore list.

private String mErrorsToIgnore = "E017,E011,E045,E023,E041,E001,E012,E003,E022,E004,E037,E002";

So it seems that some errors are being filtered out of the frequency graph even if they aren't in the ignore list.

We should add some unit tests that run the tool and then read in the resulting graphs.xlsx file and executes assertions for the cell contents to verify that the actual output matches the expected output.

Steps to reproduce:

I was running a branch of the validator and quality calculator:

  1. Clone/checkout the validator branch https://github.com/CUTR-at-USF/gtfs-realtime-validator/tree/arrivals-and-or-departures
  2. In the gtfs-realtime-validator-lib folder, run mvn clean install to install a new Maven artifact 1.0.0-SNAPSHOT-arrive-and-depart
  3. Clone/checkout the quality calculator branch https://github.com/CUTR-at-USF/transit-feed-quality-calculator/tree/arrivals-and-or-departures
  4. Rebuild quality calculator in IntelliJ (or run mvn clean install in transit-feed-quality-calculator folder)
  5. Run edu.usf.cutr.transitfeedqualitycalculator.Main.main() (note that you probably want to delete the Netherlands folder before running validation stage on it to avoid waiting forever for validation to complete due to feed size)

Expected behavior:

Show me error frequency for errors E100, E101, and E102 in the graphs.xlsx file "Error Frequency" tab in the graph "Most Frequent Errors and Warnings...".

These errors aren't in the ignore list:

private String mErrorsToIgnore = "E017,E011,E045,E023,E041,E001,E012,E003,E022,E004,E037,E002";

...so they should appear in the graph.

Observed behavior:

E100, E101, and E102 don't appear in the graphs.xlsx file "Error Frequency" tab in the graph "Most Frequent Errors and Warnings...".

Platform:

Windows 7 Enterprise w/ jdk1.8.0_73

Suryakandukoori commented 6 years ago

@barbeau I was not able to reproduce errors 'E100, E101, and E102' even without using the code to ignore errors/warnings. Also, I was not able to find the description of these errors in edu.usf.cutr.gtfsrtvalidator.lib.validation.ValidationRules. Please elucidate about these particular errors. I found a variable renaming issue in edu.usf.cutr.transitfeedqualitycalculator.ResultsAnalyzer, I fixed the issue and tried checking the functionality of ignoring errors/warnings by adding other errors to ignore list, and see that it is working fine now.

barbeau commented 6 years ago

Ok, well that's good if you found a bug that might be affecting this!

Here is the ValidationRules for 100, 101, and 102: https://github.com/CUTR-at-USF/gtfs-realtime-validator/blob/arrivals-and-or-departures/gtfs-realtime-validator-lib/src/main/java/edu/usf/cutr/gtfsrtvalidator/lib/validation/ValidationRules.java

Its in the arrivals-and-or-departures branch, so you'll need to git fetch for the main Github repo and then git checkout arrivals-and-or-departures. Similar with quality calculator branch.

On Nov 8, 2017 7:46 PM, "Suryakandukoori" notifications@github.com wrote:

@barbeau https://github.com/barbeau I was not able to reproduce errors 'E100, E101, and E102' even without using the code to ignore errors/warnings. Also, I was not able to find the description of these errors in edu.usf.cutr.gtfsrtvalidator.lib.validation.ValidationRules. Please elucidate about these particular errors. I found a variable renaming issue in edu.usf.cutr. transitfeedqualitycalculator.ResultsAnalyzer, I fixed the issue and tried checking the functionality of ignoring errors/warnings by adding other errors to ignore list, and see that it is working fine now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CUTR-at-USF/transit-feed-quality-calculator/issues/16#issuecomment-343010671, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4pLX2dlSHd9NSUMlSfOvWipN2Esuszks5s0ktagaJpZM4QW2Ii .

Suryakandukoori commented 6 years ago

I was able to replicate the scenario and even see instances of E102 after the minor fix #18. It was weird that 'Mvn clean install' behavior was not consistent always. The first two times I did a clean install after providing the snapshot of 'gtfs-realtime-validator-lib 1.0.0-SNAPSHOT-arrive-and-depart' and checked for the decompiled ValidationRules.class, I couldn't find E100 and new rules in that class. Later, I tried 'mvn clean' and 'mvn install' which provided the proper gtfs code to the calculator.

barbeau commented 6 years ago

Awesome, thanks for working on this! I'll test tomorrow morning.

On Nov 8, 2017 11:54 PM, "Suryakandukoori" notifications@github.com wrote:

I was able to replicate the scenario and even see instances of E102 after the minor fix #18 https://github.com/CUTR-at-USF/transit-feed-quality-calculator/pull/18. It was weird that 'Mvn clean install' behavior was not consistent always. The first two times I did a clean install after providing the snapshot of 'gtfs-realtime-validator-lib 1.0.0-SNAPSHOT-arrive-and-depart' and checked for the decompiled ValidationRules.class, I couldn't find E100 and new rules in that class. Later, I tried 'mvn clean' and 'mvn install' which provided the proper gtfs code to the calculator.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CUTR-at-USF/transit-feed-quality-calculator/issues/16#issuecomment-343047184, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4pLV2PFy7Jmqewr6jl2F1BRZotwZcTks5s0oWagaJpZM4QW2Ii .

barbeau commented 6 years ago

Turns out I had a bug in the branch for the gtfs-realtime-validator too that was causing E100 not to show up - fixed now in https://github.com/CUTR-at-USF/gtfs-realtime-validator/commit/797d912d36077bd30de25f6a0fe8b2bd81ded7eb.