DominikBuchner / BOLDigger-commandline

BOLDigger as a commandline tool
MIT License
8 stars 0 forks source link

Finding best fitting hit fails for JAMP and BOLDigger method #6

Closed naurasd closed 1 year ago

naurasd commented 1 year ago

Hi,

I having issues getting the top hit for the JAMP and BOLDigger methods. I am using the most recent version.

Error message (BOLDigger method, JAMP method fails with the same error):

$ boldigger-cline digger_hit BOLDResults_COI_cluster_reps_curated_no_contam.xlsx 12:08:01: Opening resultfile. 12:08:23: Filtering data for JAMP hits. Traceback (most recent call last): File "c:\users\nauras\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\nauras\programs\python\python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\Nauras\Programs\Python\Python39\Scripts\boldigger-cline.exe__main__.py", line 7, in File "c:\users\nauras\programs\python\python39\lib\site-packages\boldigger_cline__main__.py", line 70, in main digger_sort.main(args.xlsx_path) File "c:\users\nauras\programs\python\python39\lib\site-packages\boldigger_cline\digger_sort.py", line 15, in main jamp_hits = [jamp_hit(otu) for otu in otu_dfs] File "c:\users\nauras\programs\python\python39\lib\site-packages\boldigger_cline\digger_sort.py", line 15, in jamp_hits = [jamp_hit(otu) for otu in otu_dfs] File "c:\users\nauras\programs\python\python39\lib\site-packages\boldigger\jamp_hit.py", line 55, in jamp_hit threshold, level = get_threshold(df) File "c:\users\nauras\programs\python\python39\lib\site-packages\boldigger\jamp_hit.py", line 13, in get_threshold elif threshold >= 98: TypeError: '>=' not supported between instances of 'str' and 'int'

There seems to be an error with how similarity values are being assessed? The first hit method works fine. My results file is attached. BOLDResults_COI_cluster_reps_curated_no_contam.xlsx

Cheers

Nauras

naurasd commented 1 year ago

I ran the exact same command on a HPC now and it worked. Not sure what is going on here. General python issue?

DominikBuchner commented 1 year ago

Also fixed with the newest version.

naurasd commented 1 year ago

yes, cool

OndroV commented 1 year ago

Hi,

I experienced a very rare error today, which imho doesn't need a separate thread. So I just report here in case it's useful for anyone in the future. Similar to Nauras' case, my process crashed in the step of Filtering data for JAMP hits within digger_hit, but with an "IndexError: list index out of range".

The reason appeared to be empty lines in the middle of the BOLDResults xlsx file - a single batch got saved incorrectly without crashing the identification process. Looks like some glitch, because it happened only in one out of >10 000 batches that I ran over the weekend in multiple terminal windows simultaneously. I could continue with finding best hits after I simply put that batch of sequences into a separate fasta, ran identification for it and replaced the problematic lines in BOLDResults by the new result.

Cheers!

image image