Create a new function that generates ROC Curves - Githubissues

UCSD-E4E / PyHa

A repo designed to convert audio-based "weak" labels to "strong" intraclip labels. Provides a pipeline to compare automated moment-to-moment labels to human labels. Methods range from DSP based foreground-background separation, cross-correlation based template matching, as well as bird presence sound event detection deep learning models!

Other

16 stars 10 forks source link

Create a new function that generates ROC Curves #4

Open JacobGlennAyers opened 3 years ago

JacobGlennAyers commented 3 years ago

We need to allow the user to set sweep through a range of [0,1] with steps of .01. These will write out to the global statistics function, which allows us to build ROC curves. In this situation we will need to add to the output pandas dataframe to have an extra column with the threshold values.

JacobGlennAyers commented 3 years ago

We have most of the parts available to complete this. We can use the scikitplot library that was used to generate ROC curves for test sets earlier in fall to take in the output from the dataset_IoU() function that appends the best fit IoU score to each annotation. The main caveat is that we may need to augment the aforementioned dataset_IoU() output with 0 Manual ID's that represent bird-absent annotations and get their respective IoU scores. This would be needed most definitely to make the most exhaustive ROC Curves related to recall. However, we should be able to acquire precision ROC curves without this process.

JacobGlennAyers commented 3 years ago

Would be nice to get it to work for either the outputs of the general overlap statistics or the IoU statistics.

gados2000 commented 3 years ago

A few questions:

What would you like the functions parameters be?
Would you like the output to be the visualization of the graph? Any preferences in terms of formatting?
Any functions/documentation I should be looking at that might be helpful or relevant?

JacobGlennAyers commented 3 years ago

I want to be able to pass in a dataframe that contains multiple iterations of the dataset IoU scores. As of right now, for the "threshold" column, I am using the "threshold_const" key from the isolation_parameters. It would also be nice to have a save_fig option, similar to those seen in the existing functions of visualizations.py.
Yes, this function would end up in the visualizations.py. This would mean that I just call the function by itself with the right parameters, and out pops a graph based on said parameters. Though I would like an alternative version that simply outputs the AuC value.
Yes, there are some existing scikit-learn functions for ROC Curves, so you may be able to leverage those. Here is a helpful link that contains all the information you need I believe: https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/

gados2000 commented 3 years ago

Ok, could you point me towards a file that'd be suitable as a test input? Also, maybe we could have the function do both, (display the graph and return the AuC value)?

JacobGlennAyers commented 3 years ago

I sent you the thread with some example files. I can give you the code I used to generate those csv's if you want. I just looped over a bunch of thresholds using the global overlap statistics using the chunk isolator with a 5s chunk size across the birdCLEF2020 validation data which also had 5s annotations. Great opportunity for you to test your knowledge of the Python package.

The only reason that I would want to two separate functions is so that we can have a clean function that just outputs AuC into the statistics.py file. Then we can have a clean matplotlib-like output for the ROC Curves in a function in the visualizations.py file. For instance, bird_label_scores() used to perform both outputting a dataframe and visualizing based on a flag passed in by the user. I ended up separating that function into two separate functions bird_label_scores() ==> (bird_label_scores() & plot_bird_label_scores()) all for the sake of better organization.

JacobGlennAyers commented 2 years ago

This can come after we do the confidence scores on the annotations being created. Furthermore, we can generate AUROC metrics using the digital integral scipy function: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.integrate.simps.html

JacobGlennAyers commented 2 years ago

Maybe consider adding in confidence intervals: https://www.hindawi.com/journals/jps/2015/934362/