Added base scoring program

Imageomics / HDR-anomaly-challenge

Repository for Imageomics' CodaBench challenge as part of the HDR Anomaly ML Challenge.

MIT License

3 stars 0 forks source link

Added base scoring program #14

Closed DavidCarlyn closed 3 months ago

DavidCarlyn commented 4 months ago

Addresses #5

I've migrated the major-minor functionality over, but haven't adapted it for this particular format yet, but should be able to add it soon.

The .txt files I added to the reference_data folder were used to quickly test this setup. This will still need testing for an entire approach (once we get the baseline up and running on this format, we can use that).

I made several assumptions on were the data (prediction, solutions, etc.) will be held. @egrace479 let me know if you see a problem with my assumptions.

Also worth noting is we don't have error checking for the input files (predictions for example). Should we?

egrace479 commented 4 months ago

The location for data is described in the competition.yaml file.

I think we're meant to have the ingestion program pull in the input_data (the validation and testing images), then the submitted model.py should return the predictions to be matched against the reference_data. Presumably, they could return a table with the image filename and prediction (hybrid or not). I currently have the reference_data CSVs (butterfly_ref_<valid or test>_<A or mimic>.csv) set up with a ssp_indicator column (major and minor) for species A. They all have a filename and hybrid_stat_ref column to match with the testing images.

I'll read through what you have here on Monday. Glad you included a test case!

DavidCarlyn commented 3 months ago

I still have to update the second scoring program score_maj_min.py to take into account the major and minor scoring, but there should be enough here to start testing.

DavidCarlyn commented 3 months ago

I updated the sample code submission for the DINOv2 baseline. Some notes:

Need to test & debug updated baseline code (with ingestion, scoring, etc.)
Still need to update scoring for major-minor use-case.

DavidCarlyn commented 3 months ago

When we are evaluating in scoring_program_A, will all the entries be either major and minor subspecies and no others? I'm double checking, because before we would include more than just the major and minor subspecies to calculate the threshold and just report the accuracy of the major and minor rows. Since we are splitting this up into different test sets/tasks, then I am assuming that we are calculating the threshold per task then. I believe we are in agreement on that, just double checking.

egrace479 commented 3 months ago

@work4cs, this is working as expected now (just without the container).

egrace479 commented 3 months ago

@work4cs and @DavidCarlyn I think we're good at this point? We'll change the scoring programs to not require the requirements file once we get the container functioning.