envmetagen / metabinkit

Set of programs to perform taxonomic binning.
GNU General Public License v3.0
2 stars 1 forks source link

min_pident seems wrong (negative numbers and 0s) #20

Closed bastianegeter closed 3 years ago

bastianegeter commented 4 years ago

See the README.md for example. Here is output again:

$ head -4 out0.bins.tsv 
qseqid  pident  min_pident  K   P   C   O   F   G   S
6fcff7c8-2031-4e3a-a8f0-72dc2da71c79_runid=407cb32920f83b2252d840c6a949244d8c2a3bb9_ss_sample_id=Mussels-ITD11-A-UNIO-RUN7  97.015  -2.985  Eukaryota   Mollusca    Bivalvia    Unionida    Unionidae   Sinanodonta Sinanodonta woodiana
d36ef3ba-f3d5-4952-b683-301f1a959cfa_runid=407cb32920f83b2252d840c6a949244d8c2a3bb9_ss_sample_id=Mussels-ITD11-A-UNIO-RUN7  100 0   Eukaryota   Mollusca    Bivalvia    Unionida    Unionidae   Sinanodonta Sinanodonta woodiana
9ef96e73-a5b6-4c4f-bc59-2b8238281d77_runid=407cb32920f83b2252d840c6a949244d8c2a3bb9_ss_sample_id=Mussels-ITD24-A-UNIO-RUN7  97.059  -2.941  Eukaryota   Mollusca    Bivalvia    Unionida    Unionidae   Sinanodonta Sinanodonta woodiana
bastianegeter commented 4 years ago

min_pident still reports zeros and NAs. See the README.md for example (devel). I see ##min_pident (NA if not binned) but think it makes more sense to report the min_pident in any case zeros also dont make sense to me

Also, not sure how it currently working, but I think min_pident should be the min_pident of entries after filtering (i.e. those considered for binning). If only one entry remains I think min_pident should = pident

Shall I try to amend...

nunofonseca commented 3 years ago

The 0 in min_pident was a bug - now solved in the devel branch.

The min_pident is the minimum pident of entries considered for binning after applying the respective filter on pident. NAs will appear when is not possible to bin due to not having entries passing the pident threshold.