ground truth file for eval.py?

anuradhawick / LRBinner

LRBinner is a long-read binning tool published in WABI 2021 proceedings and AMB.

https://doi.org/10.4230/LIPIcs.WABI.2021.11

GNU General Public License v2.0

28 stars 4 forks source link

ground truth file for eval.py? #11

Closed tomsauv closed 1 year ago

tomsauv commented 1 year ago

Hi, I binned my reads. Could you provide some information on the format needed for the 'ground truth' file to provide to the eval.py script?

"--truth', '-t', help="Path to text file with grounds truth"

Thank you very much

anuradhawick commented 1 year ago

Hi, you need to provide a file with ground truth text each line in the same order of input reads. eg -

yeast
yeast
e.coli
e.coli
human
mouse

Aluminio-visto commented 1 year ago

Hello, I think I don't understand the meaning of "ground truth". By that do you mean a file telling the taxonomy for every read in the fasta file? How may I get that? Using Kraken2, perhaps?

anuradhawick commented 1 year ago

Hi,

Ground truth parameter is only for evaluating the tool. It should not be used unless that’s the motive. This was implanted for parameter tuning and evaluation purposes.

I hope this clears things up.

Cheers Anuradha

Aluminio-visto commented 1 year ago

Hi, thank you very much for quick reply, and yep, that cleared things up but now I have another question: How do you get a result like the one you show in the README (see below)? I don't find any file looking like that result table in my lrb folder. Which tool should I use?

Example in README: """ Bin-0(4) Bin-1(1) Bin-2(3) Bin-3(0) Bin-4(5) Bin-5(2) Bin-6(6) Bin-7_(7) CP002618.1_Lactobacillus_paracasei_strain_BD-II 744 973 92 70169 215 0 0 0 NC_011658.1_Bacillus_cereus_AH187 110 11 277 10 30129 0 1 439 CP002807.1_Chlamydia_psittaci_08DC60_chromosome 0 0 9 0 1 0 0 11014 NC_012883.1_Thermococcus_sibiricus_MM_739 74 0 32441 1 28 2 0 8 """

...and so on

anuradhawick commented 1 year ago

That’s the result for test dataset in readme. The zip file should have the ground truth for you to run the eval.oh script.

Aluminio-visto commented 1 year ago

Ok, I see, but then, my point is: how did you get the ids.txt in the test dataset? Which tool should we use?

anuradhawick commented 1 year ago

You can use any tool to classify reads. Minimap2 with references or a tool like kraken2.

Aluminio-visto commented 1 year ago

OK, thank you very much!