amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

Snap aligner FASTQ record larger than buffer #113

Closed klaas-men closed 3 years ago

klaas-men commented 6 years ago

Hi developers of snap,

Introduction

I am a bioinformatician that works on Applied-Maths. I really like snap for mapping reads to bacterial genomes! I congratulate you guys on this most outstanding mapper.

Problem

I run my mapping jobs on a server. When I submit a job to the server, it gzip's the fastqs for faster transfer. However, when I ran snap on the server, it gave an error on these fastq.gz files: "FASTQ record larger than buffer size at /../test_2.fastq.gz:4885027"

I suppose this error is similar to https://www.biostars.org/p/278787/

Solution

When I tried to replicate the mapping on my own local linux system I however did not got any errors. In order to to pinpoint the problem, I compared the fastq's on my own system and those on the servers with md5sum and diff and it appeared that my code had removed the newline on the end of the fastq-file before compressing it with gzip and sending it to the server - my bad. When I re-added this newline at the end of the fastq before compressing, the problem was gone.

I will change my own code, but I also think it might be good for snap that it also accepts fastq.gz-files without a newline on the end of the fastq.

Keep up the good work! Klaas Mensaert

bolosky commented 6 years ago

Thanks for reporting this. I’ll take a look at it as I make updates to SNAP.

--Bill

From: klaas-men notifications@github.com Sent: Monday, April 16, 2018 6:18 AM To: amplab/snap snap@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [amplab/snap] Snap aligner FASTQ record larger than buffer (#113)

Hi developers of snap,

Introduction

I am a bioinformatician that works on Applied-Maths. I really like snap for mapping reads to bacterial genomes! I congratulate you guys on this most outstanding mapper.

Problem

I run my mapping jobs on a server. When I submit a job to the server, it gzip's the fastqs for faster transfer. However, when I ran snap on the server, it gave an error on these fastq.gz files: "FASTQ record larger than buffer size at /../test_2.fastq.gz:4885027"

I suppose this error is similar to https://www.biostars.org/p/278787/https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.biostars.org%2Fp%2F278787%2F&data=02%7C01%7Cbolosky%40microsoft.com%7C597a19358f5849446e7d08d5a39c75aa%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636594814802773764&sdata=8rDbwCO3MeH7A6NInAdz3E0lSwTaEEaywkW4TzVAnEg%3D&reserved=0

Solution

When I tried to replicate the mapping on my own local linux system I however did not got any errors. In order to to pinpoint the problem, I compared the fastq's on my own system and those on the servers with md5sum and diff and it appeared that my code had removed the newline on the end of the fastq-file before compressing it with gzip and sending it to the server - my bad. When I re-added this newline at the end of the fastq before compressing, the problem was gone.

I will change my own code, but I also think it might be good for snap that it also accepts fastq.gz-files without a newline on the end of the fastq.

Keep up the good work! Klaas Mensaert

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F113&data=02%7C01%7Cbolosky%40microsoft.com%7C597a19358f5849446e7d08d5a39c75aa%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636594814802773764&sdata=RKC6lmQMn5L9Nh8xL39CXkF3pYg%2FJP5GbAbjud8FtKE%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA752bVNlhVfWaHWPdNilm92wt6snKLnks5tpJn_gaJpZM4TWgcX&data=02%7C01%7Cbolosky%40microsoft.com%7C597a19358f5849446e7d08d5a39c75aa%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636594814802773764&sdata=byoPmNazo7vG%2BnHpBX7ILyWW2jUjcSR%2BFiBY%2FKESprY%3D&reserved=0.

kevinxchan commented 5 years ago

Any updates to this? I encounter the same error when I try to align nanopore reads. I'm also using gzipped fastq inputs, and I found some older issues online suggesting to decompress the files first, but the SNAP website says it supports gzipped fastq so I don't think this is the problem. For reference, I am using SNAP 1.0beta.18 for Linux (64-bit), and I get the following error:

Loading index from directory... 0s.  17414383 bases, seed size 22
Aligning.
FASTQ record larger than buffer size at /home/kchan/thesis/raw_data/SRR7690687.fastq.gz:8388608
SNAP exited with exit code 1 from line 255 of file SNAPLib/FASTQ.cpp
bolosky commented 5 years ago

This is an issue with handling very long reads. If a single read is longer than the IO buffer, then this will happen.

You can fix this in one of two ways. Either define LONG_READS at the beginning of Read.h in snaplib (that is, take out the two slashes before the line //#define LONG_READS) or else increase the max read size by updating MAX_READ_LENGTH just below.

Either of these things will increase the amount of memory SNAP uses, because it allocates buffer space for the maximum read length (that’s also why it’s a compile-time rather than command line option). But if you’re doing nanopore reads, you’ll need to, since they’re typically quite long.

--Bill

From: Kevin Chan notifications@github.com Sent: Friday, December 21, 2018 5:04 PM To: amplab/snap snap@noreply.github.com Cc: Bill Bolosky bolosky@microsoft.com; Comment comment@noreply.github.com Subject: Re: [amplab/snap] Snap aligner FASTQ record larger than buffer (#113)

Any updates to this? I encounter the same error when I try to align nanopore reads. I'm also using gzipped fastq inputs, and I found some older issues online suggesting to decompress the files first, but the SNAP website says it supports gzipped fastq so I don't think this is the problem. For reference, I am using SNAP 1.0beta.18 for Linux (64-bit), and I get the following error:

Loading index from directory... 0s. 17414383 bases, seed size 22

Aligning.

FASTQ record larger than buffer size at /home/kchan/thesis/raw_data/SRR7690687.fastq.gz:8388608

SNAP exited with exit code 1 from line 255 of file SNAPLib/FASTQ.cpp

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F113%23issuecomment-449533068&data=02%7C01%7Cbolosky%40microsoft.com%7C1558c69a7c9142b40a8f08d667a952a5%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636810374268913200&sdata=qVruziasBFW6NczzivwiKIlp54NYLzPedEJxk7O5K1g%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA752Tm_FVhmYZ9jT43GuQwm3L5MA8VWks5u7YTwgaJpZM4TWgcX&data=02%7C01%7Cbolosky%40microsoft.com%7C1558c69a7c9142b40a8f08d667a952a5%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636810374268913200&sdata=nR6Ud%2B%2B1e7iyp1HmKSHrRuBToCKd7cRhwnq8SLXT7iE%3D&reserved=0.