DerKevinRiehl / TransposonUltimate

TransposonUltimate - a holistic set of tools for transposon identification
GNU General Public License v3.0
77 stars 5 forks source link

Mode 2: Prediction evaluation #5

Open Egor-Lebedev opened 2 years ago

Egor-Lebedev commented 2 years ago

Hello! To use the Prediction evaluation module, you must pass trueLabelFile to the labels input. Can you please tell me how to get these labels?

DerKevinRiehl commented 2 years ago

Hello Egor-Lebedev, thank you very much for your interest in TransposonUltimate!

In your question you are referring to mode 2 "prediction evaluation" of the tool transposon classifier RFSB, which is described on this page: https://github.com/DerKevinRiehl/transposon_classifier_rfsb.

The intention to use mode 2 is for users who want to evaluate the performance of their classification. For this purpose you need to have the trueLabelFile (which means the true classification) and the predictLabelFile (that was produce by any classifier). Afterwards mode 2 will compare the predicted values with the true values and provide you with statistics on common evaluation measures for classification (such as F1 score, Accuracy, MCC, etc.).

So in short: you need to know the true labels and provide the trueLabelFile. A format how to provide can be found in the example project here: https://github.com/DerKevinRiehl/transposon_classifier_rfsb/tree/main/demoFiles

Hope this answer could help you, please let me know if you have further questions. You can also explain into more detail your situation and what exactly you want to do.

Best regards, Kevin

Egor-Lebedev commented 2 years ago

Hi, thanks for your answer.

I'll write in more detail - I want to evaluate the classification using mode 2 "prediction evaluation" for honey bee transposons. predictLabelFile - I have. The question is, is it possible to get trueLabelFile? If not for bees, then for Drosophila?

Best, Egor Lebedev

DerKevinRiehl commented 2 years ago

Dear Egor, that is exactly the point in evaluation. You need labels that you consider as ground truth.

You could for example let our tool classify your transposons and than assume our classification is 100% true, and use this to check how well your predictions are.

But again, this means you need to work based on assumptions that something is true.

Best, Kevin