Closed DavidCarlyn closed 3 months ago
The location for data is described in the competition.yaml file.
I think we're meant to have the ingestion program pull in the input_data
(the validation and testing images), then the submitted model.py
should return the predictions to be matched against the reference_data
. Presumably, they could return a table with the image filename and prediction (hybrid or not). I currently have the reference_data
CSVs (butterfly_ref_<valid or test>_<A or mimic>.csv
) set up with a ssp_indicator
column (major
and minor
) for species A. They all have a filename
and hybrid_stat_ref
column to match with the testing images.
I'll read through what you have here on Monday. Glad you included a test case!
I still have to update the second scoring program score_maj_min.py
to take into account the major and minor scoring, but there should be enough here to start testing.
I updated the sample code submission for the DINOv2 baseline. Some notes:
Need to test & debug updated baseline code (with ingestion, scoring, etc.)
Still need to update scoring for major-minor use-case.
When we are evaluating in scoring_program_A, will all the entries be either major and minor subspecies and no others? I'm double checking, because before we would include more than just the major and minor subspecies to calculate the threshold and just report the accuracy of the major and minor rows. Since we are splitting this up into different test sets/tasks, then I am assuming that we are calculating the threshold per task then. I believe we are in agreement on that, just double checking.
@work4cs, this is working as expected now (just without the container).
@work4cs and @DavidCarlyn I think we're good at this point? We'll change the scoring programs to not require the requirements file once we get the container functioning.
Addresses #5
I've migrated the major-minor functionality over, but haven't adapted it for this particular format yet, but should be able to add it soon.
The .txt files I added to the reference_data folder were used to quickly test this setup. This will still need testing for an entire approach (once we get the baseline up and running on this format, we can use that).
I made several assumptions on were the data (prediction, solutions, etc.) will be held. @egrace479 let me know if you see a problem with my assumptions.
Also worth noting is we don't have error checking for the input files (predictions for example). Should we?