amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

Sorting error #136

Closed tylerjkennedy closed 2 years ago

tylerjkennedy commented 2 years ago

Sorry to bother you again with another issue, but I'm having trouble with the sorting function (-so)

If I align the PE reads there is no issue, but if I add in the -so flag like this:

snap-aligner paired index-directory pair1.fastq.gz pair2.fastq.gz -so -o alignment.bam

I get: Welcome to SNAP version 1.0dev.104. Loading index from directory... 0s. 100290401 bases, seed size 27 Aligning. sorting...Read name: . Size of BAM record 36 larger than allocated 4 SNAP exited with exit code 1 from line 550 of file SNAPLib/Bam.cpp

I tried using the sort memory flag and added "-sm 40" after -so, but had the same error. Do you know how I can fix this and get my alignments sorted and indexed?

Thank you,

Tyler

arun-sub commented 2 years ago

Could you share a small subset of reads from your fastq files for us to reproduce the error ?

If you are unable to share, you can try a few things: (1) just to confirm: does aligning the reads without the -so option work for you (only "-o alignment.bam") ? (2) try sorting the alignments in the SAM format (-so -o alignment.sam) ?

tylerjkennedy commented 2 years ago
  1. yes, aligning without the -so option works fine.
  2. sorting to SAM format gave this error: Welcome to SNAP version 1.0dev.104. Loading index from directory... 0s. 100290401 bases, seed size 27 Aligning. sorting...Segmentation fault: 11

I'll try and create a subset fastq for you to try now.

tylerjkennedy commented 2 years ago

I created a subset of the first 200 reads for the 2 PE read files (it won't let me attach them through this chat, should I email them to you?) and ran those with the -so option. This worked fine and the output was a sorted bam file.

arun-sub commented 2 years ago

Could you send a subset of reads which fails to produce the sorted bam ? You can email them to arunsub@umich.edu or upload them here: https://www.dropbox.com/request/MFYrgaqTy8VW1KsGGpMm.

tylerjkennedy commented 2 years ago

I tried subsetting the first 1,000,000 reads and they are still able to produce a sorted bam. The only difference between the subset and the parent read files is the size (each of the 2 PE files is ~5gb) and that the subsets aren't gz compressed. I can upload the 10gb of read data to the dropbox folder if you would like?

arun-sub commented 2 years ago

If you can upload the complete read set to dropbox that would be great. Let me know if you face any issues.

tylerjkennedy commented 2 years ago

I just uploaded the files. I also tried to run another set of PE data I have which is smaller (~3gb for each file) and received this error: Welcome to SNAP version 1.0dev.104. Loading index from directory... 0s. 100290401 bases, seed size 27 Aligning. sorting...SAMReader: POS field too long. SNAP exited with exit code 1 from line 799 of file SNAPLib/SAM.cpp

arun-sub commented 2 years ago

Thanks! The run for the original data finished successfully on a Linux machine. Not sure yet, but it does look like it is an OS X specific issue.

You can upload the smaller data as well and I will take a look after the first issue.

tylerjkennedy commented 2 years ago

That error with the second files is for a sam output. If I go for a bam output I get the same "Size of BAM record 36 larger than allocated 4" error as for the first set of files. I'll upload the second set in case you'd like to take a look at that error anyway.

arun-sub commented 2 years ago

Hi Tyler,

Sorry for the delay in getting back. I pushed a fix for the sorting issue to the os-x-sort-fix branch. Could you try it out when you get a chance ? We will merge it to master once you validate it.

git clone -b os-x-sort-fix https://github.com/amplab/snap
make

--Arun

tylerjkennedy commented 2 years ago

Hi Arun,

Thank you for doing this. I'm a little caught up in other projects at the moment, but I will try to test out this fix in the next week or two.

Best,

Tyler

tylerjkennedy commented 2 years ago

Hi,

I just ran this with my data and everything went smoothly.

Thank you again for all of your help!

Tyler