Closed jfy133 closed 1 year ago
Ok @alexhbnr has kindly identified the problem for me:
In line 457, the script DAS_Tool.R parses the file MEGAHIT-DASTool-test_minigut_sample2.seqlength and splits the contig names on the first space. Therefore, the original contig name k141_271 flag=1 multi=3.0000 len=1018 is shortened to k141_271.
However, this is only done when parsing the *.seqlength file but not the file test_minigut_sample2.tsv.
In line 482, it then compares the contig names between the *.seqlength table and the test_minigut_sample2.tsv file and doesn't find any overlaps because the former are shortened.
There are two possible fixes: either we are shortening the contig names in the TSV file that we provide to avoid the splitting issue or we make a PR to DAS Tool to perform the splitting on both tables.
I'm not sure where the contig name shortening is happening in our pipeline yet (I've certainly not done this atively, but I will have a look and report back if it's my fault
I've pushed a fix for this issue into the master branch. You can try running your above example again using the attached modified contig2bin file (which matches the single copy genes found on your fasta file). test_minigut_sample2.tsv.zip
Hi @cmks , thanks for the fast fix! Unfortunately I accidently deleted my pipeline run (nf-core/mag) results where I hit the error. I will run it again to hit the error and test the fix but I'm at a workshop for the next couple of days so it might take a bit of time to confirm the fix (sorry about that!)
Ok, fortunately one of the workshop sessions was not necessary for me so I was able to test this, I can confirm it works - thank you very much!
Ah one more question @cmks do you have a rough ETA how long it could take for a patch release containing the fix?
For my particular purpose I would need the DAS_Tool bioconda recipe to me updated for use it in the pipeline I'm working on
I'll also need an updated Conda recipe, just ran into this myself!
Done: https://github.com/cmks/DAS_Tool/releases/tag/1.1.6. The bioconda recipe should update automatically after some time.
Hello,
I have hit an issue similar to one of the ones in https://github.com/cmks/DAS_Tool/issues/78
Where I get an error:
However when I search for each of the 36 contigs listed in the contig2bin file in the assembly fasta (via grep, so should be an exact match, and is direct output of CONCOCT), I find them...
I was wondering if anyone could help identify where the mismatch is happening...?
dastool_contig2binerror.zip