Closed SvetlanaUP closed 2 years ago
It is because that some dependencies cannot run on Mac. We will update this in the docs and the tool.
If this is the ORF finder, can we just switch to another one? Why not prodigal
even?
Looking at this again, I am not sure that FragGeneScan
is being called correctly. At https://github.com/BigDataBiology/SemiBin/blob/9d5d5b79fb3caa4509a7aca4969fd78a5244fa7f/SemiBin/utils.py#L266, should it not -w 1
instead of -w 0
?
I see this parameter -w 0
in other tool (Maxbin2; SolidBin), so I used this. But I agree that we can try to change to Prodigal in the later version.
Sincerely Shaojun
My expectation is that it does not make much of a difference in terms of results, but we should check a few samples. In that case, we can make the choice based only on criteria like "easy to install and interface with".
I will try it in the later version.
@psj1997 Can you open a new issue just for substituting the ORF finder? This one mixed a bit of different things, so maybe start clean
Hi Shaojun, I just ran this, and got an error...
SemiBin single_easy_bin -i fa.gz -b .bam -o output
SemiBin single_easy_bin -i fa.gz -b .bam -o output 2022-01-31 18:44:13,152 - Generate training data. 2022-01-31 18:44:32,483 - Calculating coverage for every sample. 2022-01-31 18:45:47,911 - Processed:CCMD75147712ST.mapped.sorted.bam 2022-01-31 18:45:48,080 - Start generating kmer features from fasta file. 2022-01-31 18:46:38,196 - Running mmseqs and generate cannot-link file. 2022-01-31 18:46:39,604 - Downloading GTDB to /Users/svetlana/.cache/SemiBin/mmseqs2-GTDB. It will take a while.. #IT WORKED FOR MORE THAN 3 hours 2022-02-01 10:22:41,636 - Download finished. Checking MD5... Error: MD5 check failed, removing '/Users/svetlana/.cache/SemiBin/mmseqs2-GTDB/GTDB_v95.tar.gz'.
so I ran this instead (to save some time!). This was fast but got another error.
SemiBin single_easy_bin -i fa -b .bam -o output --environment human_gut
2022-02-01 11:20:03,427 - Generate training data. 2022-02-01 11:20:03,749 - Calculating coverage for every sample. 2022-02-01 11:21:19,435 - Processed:CCMD75147712ST.mapped.sorted.bam 2022-02-01 11:21:19,605 - Start generating kmer features from fasta file. 2022-02-01 11:22:08,353 - Start binning. 2022-02-01 11:22:09,940 - Calculating depth matrix. 2022-02-01 11:22:10,108 - Edges:143927 2022-02-01 11:22:14,539 - Reclustering.
Error: Failed to open sequence file /var/folders/zp/pmq94j9j04j7sp3z8r6ms5z40000gn/T/tmplu1eaiw_/contigs.faa.faa for reading