Open kevfly16 opened 2 years ago
Have you tried using a dmnd database instead of BLAST? BLAST databases rely on mmap which often does not work well together with certain storage types.
I just tried using a dmnd database and it seems to work now. I did some research on FUSE and mmap, and it does seem that there may be some issues. I saw here that using a private flag for mmap can help make FUSE compatible. Is that possible to implement within this project?
Separately, when I use the diamond view
command on the output from the run, I'm getting an Input/output error
. Does this similarly have something to do with using FUSE and mmap?
diamond view \ -a /mnt/s3fs/.../out.daa \ --top 5 \ --out /mnt/s3fs/.../out.ur100 \ --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore ppos qframe score salltitles diamond v2.0.13.151 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
CPU threads: 16
Loading subject IDs... [16.661s] Scoring parameters: (Matrix=blosum62 Lambda=0.267 K=0.041 Penalties=11/1) DB sequences = 34281384 DB sequences used = 4257343 DB letters = 15925806162 Percentage range of top alignment score to report hits: 5 Generating output... Input/output error Error writing file /mnt/s3fs/.../out.ur100 terminate called after throwing an instance of 'File_write_exception' what(): Error writing file /mnt/s3fs/.../out.ur100 Aborted
I just tried using a dmnd database and it seems to work now. I did some research on FUSE and mmap, and it does seem that there may be some issues. I saw here that using a private flag for mmap can help make FUSE compatible. Is that possible to implement within this project?
I would assume the mapping is already private. I'm using the NCBI library for accessing BLAST databases so it's not that easily modified.
Separately, when I use the
diamond view
command on the output from the run, I'm getting anInput/output error
. Does this similarly have something to do with using FUSE and mmap?
No, diamond view
does not use mmap, so this has to be a separate issue. I have no idea at the moment what may be causing this.
diamond view -a /mnt/s3fs/.../out.daa --top 5 --out /mnt/s3fs/.../out.ur100 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore ppos qframe score salltitles diamond v2.0.13.151 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
CPU threads: 16
Loading subject IDs... [16.661s] Scoring parameters: (Matrix=blosum62 Lambda=0.267 K=0.041 Penalties=11/1) DB sequences = 34281384 DB sequences used = 4257343 DB letters = 15925806162 Percentage range of top alignment score to report hits: 5 Generating output... Input/output error Error writing file /mnt/s3fs/.../out.ur100 terminate called after throwing an instance of 'File_write_exception' what(): Error writing file /mnt/s3fs/.../out.ur100 Aborted
Hi,
I'm having an issue with using diamond on a relatively large file size. It usually gets to the closing the output file step and hangs for a couple of hours before exiting with a
bus error
. When I decrease the block-size to below 16 (this splits my reference database into 2 blocks), it seems to get to the computing alignments step and exits with aninput/output error
.I've tried varying
--block-size 20
,--block-size 16
,--block-size 8
,--bin 16
and--bin 64
with the same results. From looking at dmesg output, I can confirm I'm not seeing any messages from the kernel that would indicate the process was killed. Smaller files (~1,000 sequences) I've tried work and finish in 10-15 min with most of the time spent on loading reference sequences and building reference histograms.Appreciate any help. See more info below.
EC2 Instance Specs:
I/O: