anuradhawick / MetaBCC-LR

Reference-free Binning of Metagenomics Long Reads using Coverage and Composition
https://doi.org/10.1093/bioinformatics/btaa441
MIT License
20 stars 0 forks source link

dsk not found #5

Closed Electrocyte closed 3 years ago

Electrocyte commented 4 years ago

Hi, looking forward to using your tool, just an issue with getting it running. I had an issue with directories not being complete as posted by someone else the output/misc, though that was resolved.

Traceback error:
2020-11-04 12:39:31,923 - INFO - Filtering reads
Filtering reads longer than 1000bp: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 949/949 [00:00<00:00, 6174.47it/s]
2020-11-04 12:39:32,227 - INFO - Filtering reads complete
2020-11-04 12:39:32,227 - INFO - Running DSK k-mer counting
sh: 1: dsk: not found
2020-11-04 12:39:32,243 - ERROR - Error in step: Running DSK
2020-11-04 12:39:32,244 - ERROR - Failed due to an error. Please check the log. Good Bye!

Log files:

cat MetaBCC-LR/metabcc-lr.log
2020-11-04 12:36:21,521 - INFO - Filtering reads
2020-11-04 12:36:21,667 - DEBUG - Total of 949 reads to filter
2020-11-04 12:37:52,021 - INFO - Filtering reads
2020-11-04 12:37:52,175 - DEBUG - Total of 949 reads to filter
2020-11-04 12:39:31,923 - INFO - Filtering reads
2020-11-04 12:39:32,070 - DEBUG - Total of 949 reads to filter
2020-11-04 12:39:32,227 - INFO - Filtering reads complete
2020-11-04 12:39:32,227 - INFO - Running DSK k-mer counting
2020-11-04 12:39:32,227 - DEBUG - Running DSK
2020-11-04 12:39:32,243 - ERROR - Error in step: Running DSK
2020-11-04 12:39:32,244 - ERROR - Failed due to an error. Please check the log. Good Bye!

Do you think it is an issue with the number of reads < 1000?

After trying with various fastq files containing > 1000 reads, I get a similar error:

cat MetaBCC-LR/metabcc-lr.log
2020-11-04 12:54:16,495 - INFO - Filtering reads
2020-11-04 12:54:17,584 - DEBUG - Total of 2055 reads to filter
2020-11-04 12:54:17,922 - INFO - Filtering reads complete
2020-11-04 12:54:17,923 - INFO - Running DSK k-mer counting
2020-11-04 12:54:17,923 - DEBUG - Running DSK
2020-11-04 12:54:17,938 - ERROR - Error in step: Running DSK
2020-11-04 12:54:17,939 - ERROR - Failed due to an error. Please check the log. Good Bye!
2020-11-04 12:54:51,317 - INFO - Filtering reads
2020-11-04 12:54:53,425 - DEBUG - Total of 13727 reads to filter
2020-11-04 12:54:55,658 - INFO - Filtering reads complete
2020-11-04 12:54:55,660 - INFO - Running DSK k-mer counting
2020-11-04 12:54:55,660 - DEBUG - Running DSK
2020-11-04 12:54:55,676 - ERROR - Error in step: Running DSK
2020-11-04 12:54:55,676 - ERROR - Failed due to an error. Please check the log. Good Bye!
2020-11-04 12:55:47,121 - INFO - Filtering reads
2020-11-04 12:55:57,887 - DEBUG - Total of 71432 reads to filter
2020-11-04 12:56:09,098 - INFO - Filtering reads complete
2020-11-04 12:56:09,098 - INFO - Running DSK k-mer counting
2020-11-04 12:56:09,099 - DEBUG - Running DSK
2020-11-04 12:56:09,116 - ERROR - Error in step: Running DSK
2020-11-04 12:56:09,117 - ERROR - Failed due to an error. Please check the log. Good Bye!

What does dsk not found mean? A google search did not return anything obvious.

anuradhawick commented 4 years ago

Hi,

The error comes as you do not have the DSK took. This is a k-mer counting tool used by our tool. You can install it from here. Sorry for the confusion, we had included this as a third-party dependency in the landing page readme of the repo.

Best regards Anuradha

Electrocyte commented 4 years ago

Thank you for pointing that out - I missed it during the installation.

Have just tested and the tool works now, thanks!

Another thing I just noticed too is that the script does not absolute paths: Input MetaBCC-LR -r /home/james/SequencingData/TCPA/analysis/S7B4/demux/barcode01.fastq -s 3 -o /home/james/SequencingData/TCPA/analysis/S7B4/demux/MetaBCC-LR Output: FileNotFoundError: [Errno 2] No such file or directory: './/home/james/SequencingData/TCPA/analysis/S7B4/demux/MetaBCC-LR/profiles/3mers'

Initially this fix worked; however it has to be relative to your current path: MetaBCC-LR -r SequencingData/TCPA/analysis/S7B4/demux/barcode01.fastq -s 3 -o SequencingData/TCPA/analysis/S7B4/demux/MetaBCC-LR See examples below:

james@fourier:~/SequencingData/TCPA/analysis/S7B4$ MetaBCC-LR -r ~/SequencingData/TCPA/analysis/S7B4/filtered/barcode01.fastq -s 3 -o ~/SequencingData/TCPA/analysis/S7B4/filtered/MetaBCC-LR 2020-11-11 14:05:33,507 - INFO - Filtering reads Filtering reads longer than 1000bp: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29634/29634 [00:07<00:00, 3871.50it/s] 2020-11-11 14:05:48,418 - INFO - Filtering reads complete 2020-11-11 14:05:48,418 - INFO - Running DSK k-mer counting 2020-11-11 14:06:08,835 - INFO - Running DSK k-mer counting complete 2020-11-11 14:06:08,836 - INFO - Running DSK k-mer complete 2020-11-11 14:06:08,836 - INFO - Counting Trimers INPUT FILE /home/james/SequencingData/TCPA/analysis/S7B4/filtered/MetaBCC-LR/misc/filtered_reads.fasta OUTPUT FILE /home/james/SequencingData/TCPA/analysis/S7B4/filtered/MetaBCC-LR/profiles/3mers THREADS 8 2020-11-11 14:06:13,559 - INFO - Counting Trimers complete 2020-11-11 14:06:13,559 - INFO - Counting 15-mer profiles K-Mer file /home/james/SequencingData/TCPA/analysis/S7B4/filtered/MetaBCC-LR/misc/DSK/15mersCounts LOADING KMERS TO RAM FINISHED LOADING KMERS TO RAM 420200 INPUT FILE /home/james/SequencingData/TCPA/analysis/S7B4/filtered/MetaBCC-LR/misc/filtered_reads.fasta OUTPUT FILE /home/james/SequencingData/TCPA/analysis/S7B4/filtered/MetaBCC-LR/profiles/15mers THREADS 8 BIN WIDTH 10 COMPLETED : Output at - /home/james/SequencingData/TCPA/analysis/S7B4/filtered/MetaBCC-LR/profiles/15mers 2020-11-11 14:06:23,449 - INFO - Counting 15-mer profiles complete 2020-11-11 14:06:23,450 - INFO - Sampling Reads Traceback (most recent call last): File "/home/james/.pyenv/versions/3.8.3/bin/MetaBCC-LR", line 201, in

main() File "/home/james/.pyenv/versions/3.8.3/bin/MetaBCC-LR", line 173, in main sample_data.sample(output, sample_count, ground_truth) File "/home/james/.pyenv/versions/3.8.3/lib/python3.8/site-packages/mbcclr_utils/sample_data.py", line 10, in sample p3_data = np.loadtxt(f"./{output}/profiles/3mers", dtype=float) File "/home/james/.pyenv/versions/3.8.3/lib/python3.8/site-packages/numpy/lib/npyio.py", line 961, in loadtxt fh = np.lib._datasource.open(fname, 'rt', encoding=encoding) File "/home/james/.pyenv/versions/3.8.3/lib/python3.8/site-packages/numpy/lib/_datasource.py", line 195, in open return ds.open(path, mode, encoding=encoding, newline=newline) File "/home/james/.pyenv/versions/3.8.3/lib/python3.8/site-packages/numpy/lib/_datasource.py", line 532, in open return _file_openers[ext](found, mode=mode, FileNotFoundError: [Errno 2] No such file or directory: './/home/james/SequencingData/TCPA/analysis/S7B4/filtered/MetaBCC-LR/profiles/3mers' Cheers! On Wed, 4 Nov 2020 at 14:07, Anuradha Wickramarachchi < notifications@github.com> wrote: > Hi, > > The error comes as you do not have the DSK took. This is a k-mer counting > tool used by our tool. You can install it from here > . Sorry for the confusion, we had included > this as a third-party dependency in the landing page readme of the repo. > > Best regards > Anuradha > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . >
Electrocyte commented 4 years ago

Follow up question about the graphical output from MetaBCC-LR.

I get the following:

Quick explanation: top panels are all reads from a single sample; bottom panels are same reads only filtered for reads that have Q>10.

Sample is spike of P.aeruginosa in human cells; DNA examined are 16S bacterial amplicons. If I understand correctly, the high quality reads reflect a single bin which is a proxy for a single species, while inclusion of low quality reads results in a second population of binned reads. Also what do the various plot titles signify?

[image: image.png] Looking forward to hearing from you,

Cheers,

James

On Wed, 4 Nov 2020 at 14:07, Anuradha Wickramarachchi < notifications@github.com> wrote:

Hi,

The error comes as you do not have the DSK took. This is a k-mer counting tool used by our tool. You can install it from here https://github.com/GATB/dsk. Sorry for the confusion, we had included this as a third-party dependency in the landing page readme of the repo.

Best regards Anuradha

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/anuradhawick/MetaBCC-LR/issues/5#issuecomment-721531300, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR6EQ57DRKRCVL4TUULWDLSODVRBANCNFSM4TJSLHKA .