amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

Compression of intermediate bam file #163

Closed matthdsm closed 1 year ago

matthdsm commented 1 year ago

Hi,

Is there an option to compress the intermediate bam file? I'm aligning 80GB's worth of fastq files and the intermediate bam's grown to about 500GB, which is becoming quite a load on our FS.

Current command is

snap-aligner paired ./snapaligner sample_R1.fastp.fastq.gz sample_R2.fastp.fastq.gz -o sample.bam -t 18 -so -b- -sm 10 -I -hc- 

Thanks M

bolosky commented 1 year ago

By "intermediate" BAM file do you mean the sort temporary file that gets created during alignment but before the final output? The one that's got a name like sample.bam.tmp?

No, there's no option to compress it. Compressing is really slow and this is the first time that I, at least, have heard that it's been a big problem to find enough space for it so I'd never considered doing it.

If you mean the final output, that is always compressed. If you're getting a 500GB BAM from 80GB of gzipped FASTQ then something's wrong and we should follow up on it.

--Bill

From: Matthias De Smet @.> Sent: Thursday, December 8, 2022 4:00 AM To: amplab/snap @.> Cc: Subscribed @.***> Subject: [amplab/snap] Compression of intermediate bam file (Issue #163)

Hi,

Is there an option to compress the intermediate bam file? I'm aligning 80GB's worth of fastq files and the intermediate bam's grown to about 500GB, which is becoming quite a load on our FS.

Current command is

snap-aligner paired ./snapaligner sample_R1.fastp.fastq.gz sample_R2.fastp.fastq.gz -o sample.bam -t 18 -so -b- -sm 10 -I -hc-

Thanks M

- Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F163&data=05%7C01%7Cbolosky%40microsoft.com%7C3aaad6cb6a4c4253061608dad913c57e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638060976186244139%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rkVtZt%2B1sVrsRh1cC3xSh9P0LxvsYxHR4eCcy1gAnU4%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHPTWIYBEMOZTWKZHMSC43WMHEVBANCNFSM6AAAAAASYBSNVI&data=05%7C01%7Cbolosky%40microsoft.com%7C3aaad6cb6a4c4253061608dad913c57e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638060976186244139%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JsF2k9TKTS77WPexqmMdZh21I%2FtMFMnbQDnh%2Bsw6Bz4%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

matthdsm commented 1 year ago

Hi!

The intermediate is the bam.tmp file, the final bam is only about 50GB, so thats acceptable. Anyway, I just wanted to know if it was possible. I prefer speed over size anyways.

Thanks for the reply! M

ghuls commented 8 months ago

I think SAMtools compresses temporary BAM files, but only with compression level 1 (to have some compression, but not to much CPU overhead).