Janelia-Trackathon-2023 / traccuracy

Utilities for computing common accuracy metrics on cell tracking challenge solutions with ground truth
https://traccuracy.readthedocs.io
Other
25 stars 7 forks source link

Add CTC format checks in loader #119

Closed bentaculum closed 9 months ago

bentaculum commented 10 months ago

This adds some checks to enforce proper Cell Tracking Challenge data format already while loading, and prints descriptive error messages. As discussed in the last meeting, they are turned on by default, but can be turned off with a boolean in load_ctc_data.

I've been dealing with tracking results in faulty CTC format recently and ran into weird errors downstream of the loader in the metrics computation, which inspired this PR.

The proposed routine checks:

msschwartz21 commented 10 months ago

I haven't looked at the changes yet, but just wanted to let you know that the docs failure is my bad. I think I have it fixed on some open PR but I'm not sure which one. Is it ok if I push a commit to this branch fixing the docs issue?

bentaculum commented 10 months ago

No worries, and yes of course go ahead.

codecov-commenter commented 10 months ago

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (a61d295) 86.12% compared to head (c15c2f6) 85.54%.

Files Patch % Lines
src/traccuracy/loaders/_ctc.py 75.00% 10 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #119 +/- ## ========================================== - Coverage 86.12% 85.54% -0.59% ========================================== Files 19 19 Lines 894 927 +33 ========================================== + Hits 770 793 +23 - Misses 124 134 +10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bentaculum commented 10 months ago

Hi @cmalinmayor, sorry for the long delay I was offline for a while. Thanks for the input! I had missed that there was already the ValueError that was supposed to be thrown, which I could not find in the code though. When I was passing in broken ctc records I got different types of non-caught errors from pandas.

I updated the docstrings as you suggested, and additionally catch any error from ctc_to_graph if the more intricate checks are not run.

bentaculum commented 10 months ago

Thanks @bentaculum, this is great! I like Morgan's idea of making it run all the checks even if one fails to provide all types of failure information without having to re-run, but even as-is I can see this being very useful.

The errors are usually very few edge cases that somehow went wrong in some data format conversion, so I think simply aborting with a single descriptive error message when there's an error in the format is sufficient.

bentaculum commented 10 months ago

I just added one more check for non-connected pixels with the same ID (=more than one connected component for a label ID) in a single frame. This unfortunately does occur in some of the CTC datasets, here's a screenshot from Fluo-N2DL-HeLa/train/01, so I added it as a soft check that prints a warning.

Screenshot 2023-12-05 at 18 37 08
cmalinmayor commented 10 months ago

One question re: benchmarking. Should we set the run_checks flag to True or False in the benchmarking? Or perhaps set it to False and benchmark the checking separately? @bentaculum @msschwartz21

After the most recent mask-based check there seems to be a 30%ish increase in the loading benchmarks, I assume with the default True value compared to not checking, which is actually super useful to know.

msschwartz21 commented 10 months ago

One question re: benchmarking. Should we set the run_checks flag to True or False in the benchmarking? Or perhaps set it to False and benchmark the checking separately?

Good catch. Let's pull it out as a separate function and set the flag to False when we load data elsewhere in benchmarking. This should include the fixtures for loading data.

bentaculum commented 9 months ago

One question re: benchmarking. Should we set the run_checks flag to True or False in the benchmarking? Or perhaps set it to False and benchmark the checking separately?

Good catch. Let's pull it out as a separate function and set the flag to False when we load data elsewhere in benchmarking. This should include the fixtures for loading data.

Nice idea, just did as you suggested, comparative benchmark to main looks good now https://github.com/Janelia-Trackathon-2023/traccuracy/commit/c15c2f654a213fbc52277ef236a4fc765f1973cf#comments

I guess the comparison to the new benchmark will appear once it is merged to main.