flo-schu / peek

Peek is short for photography enhanced environmental knowledge. It contains algorithms for detecting organisms on photographs
GNU General Public License v3.0
0 stars 0 forks source link

create optimizer for fitting the detector to annotated tags #4

Closed flo-schu closed 1 year ago

flo-schu commented 1 year ago

Optimization Problem

What can the target function look like?

Regression

My dataset contains x,y coordinates and a label, indicating "Daphnia"+, "Culex", "unidentified" and "?"

I need a Classifier that returns also x,y coordinates and a label. In the easiest case, this classifier filters labels and names all the objects "Daphnia".

This prediction set can then be compared with the test set. Metrics can be:


train = np.array(groundthruth)

# make sure all labels can be matched. i.e. all relevant Daphnia+ labels --> Daphnia

for point in prediction_points:
    # point has x,y coordinates
    offset = sum_over_xy(abs(train - point))

    # get minimum offset
    candidate = argsort(offset)[0]

    # test if offset falls within margin of detection, should be very close
    if offset[candidate] < 2:
        match = candidate
        true_positive_detects += 1
    else:
        false_positives_detects += 1

    if point.label == match.label:
        point.label == "Daphnia": 
            true_positive_classifications += 1
        else:
            true_negative_classification += 1
    else:
        if point.label == "Daphnia":
            false_positive_classifications += 1
        else:
            false_negative_classification += 1

this fct. will iterate over each point in the prediction and try to find a corresponding annotated tag. Success will be mesured as detection accuracy. If a match could be found, it will be measured whether the label was correct.

alternative: ML approach

I could use a logistic Regression classification scheme, where I give several predictors to the regression such as:

and then for training and testing I can probably use a standard ML approach.

The benefit of logistic regression is that I get a probability of detection. In a second step I could manually label the ones with a low probability

Also, for this approach I already have some scripts in peek

If I'm not mistaken, I can just take the tag database (or combine the databases from the tagging) for predictors and results

Resources:

https://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html#sphx-glr-auto-examples-calibration-plot-compare-calibration-py

https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py

https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html#sphx-glr-auto-examples-model-selection-plot-underfitting-overfitting-py

classifier options

Steps:

flo-schu commented 1 year ago

Additional features to improve the classifier