amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

controlling memory usage for sort/index/markdup #133

Closed teerjk closed 3 years ago

teerjk commented 3 years ago

I'm trying to control memory usage during the sort/index/markdup step (-so). I have set -sm, but memory usage is generally the same no matter what setting I give. -sm ranges from 0.5-4 as well as not setting -sm give similar usage. Only setting -sm to 20 seemed to provoke a change (ran out of memory on 165GB node). Memory usage also seemed similar with 4, 8, and 16 threads. The memory usage generally approaches all available on the node, as reported by "RES" in top as well as by the cluster software (PBS/Torque). On a 64GB node, usage was ~63G; on a 165GB node usage was ~156GB. Usage is pretty stable during alignment (40-60GB), but then increases to all available once "sorting" is output on stdout. I looked through the code, and didn't see anything immediately obvious, but am not expert enough in C++ to really dig in to the lower level stuff. Any thoughts or hints? Thanks!

bolosky commented 3 years ago

It shouldn't do that. The sort memory usage should be on top of the alignment amount, but it shouldn't be much more the amount you give it with -sm.

What version are you running? IIRC, some very old versions had problems with extra memory usage.

You can free up some memory by having SNAP release the memory for the index before starting sort by using the -di switch, which might help if you're close.

One other thing that might be happening is that you've got such a large number of duplicates of a particular read that they're taking up all the memory; they have to all be resident at one time to process them properly. If that's what's happening there's not that much you can do aside from using -di, unless you're willing to turn off duplicate marking altogether by using -S d. That said, taking the output and putting it through Picard to do duplicate marking may not help, because aside from being painfully slow, it can also use huge amounts of memory if you have millions of duplicates of a single read.

--Bill

From: teerjk @.> Sent: Wednesday, May 5, 2021 2:00 PM To: amplab/snap @.> Cc: Subscribed @.***> Subject: [amplab/snap] controlling memory usage for sort/index/markdup (#133)

I'm trying to control memory usage during the sort/index/markdup step (-so). I have set -sm, but memory usage is generally the same no matter what setting I give. -sm ranges from 0.5-4 as well as not setting -sm give similar usage. Only setting -sm to 20 seemed to provoke a change (ran out of memory on 165GB node). Memory usage also seemed similar with 4, 8, and 16 threads. The memory usage generally approaches all available on the node, as reported by "RES" in top as well as by the cluster software (PBS/Torque). On a 64GB node, usage was ~63G; on a 165GB node usage was ~156GB. Usage is pretty stable during alignment (40-60GB), but then increases to all available once "sorting" is output on stdout. I looked through the code, and didn't see anything immediately obvious, but am not expert enough in C++ to really dig in to the lower level stuff. Any thoughts or hints? Thanks!

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F133&data=04%7C01%7Cbolosky%40microsoft.com%7Cf8df9bcc0f874c0cab4208d910006b19%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637558416250558899%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aLcacCLmtwOuJba%2BCSU8wlch0geHXITMpSPCzx0VAF8%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHPTWMRH2CDPBQVJXUGXTDTMGPVNANCNFSM44FVEPQQ&data=04%7C01%7Cbolosky%40microsoft.com%7Cf8df9bcc0f874c0cab4208d910006b19%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637558416250568892%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QFOCgbNhloOH3n3cUDuqn%2FVvs6LBfA8oSzLFFkQhagw%3D&reserved=0.

teerjk commented 3 years ago

That is the behavior I was expecting: sort/markdup total use defined by -sm in addition to that used by alignment. I am using version 84516be6e82d693629f8de1b307ac1ecef10c839, which I believe is the latest master. Happy to try a different version. I did try -di, but it didn't make a difference. I have been testing on one particular file, and so will try some others. I was interested in the observation that everything worked well on a 165GB node and a 64GB node, with memory usage approaching these total values. One possible issue is that the linux kernel on our CentOS cluster is very old, which may affect the memory allocation behavior. At the end of the day, I'm trying to predict/control usage in order to be a good citizen on our HPC, and when eventually running on the cloud, to request appropriate resources. Great work by the way! I haven't thoroughly evaluated the alignment quality, but there's no denying the performance increase.

teerjk commented 3 years ago

Sorry, I should have said that I'm testing on a human Whole Exome sample with a final BAM size of ~20GB. I'm seeing about 20% duplication overall, which isn't terribly high for these samples. I've run the same sample with a bwa-picard pipeline, and picard MarkDuplicates used 7.8GB (I requested 8GB).

bolosky commented 3 years ago

Yeah, then something's wrong. I was thinking of a case where you had 100M duplicates of the same read, but you probably don't have 100M reads total.

Is it possible to share your bam file for us to test? (Also, I'm on vacation and Arun wrote the dup marking code, so he should probably be the one to look at it.) --Bill

From: teerjk @.> Sent: Thursday, May 6, 2021 8:59 AM To: amplab/snap @.> Cc: Bill Bolosky @.>; Comment @.> Subject: Re: [amplab/snap] controlling memory usage for sort/index/markdup (#133)

Sorry, I should have said that I'm testing on a human Whole Exome sample with a final BAM size of ~20GB. I'm seeing about 20% duplication overall, which isn't terribly high for these samples. I've run the same sample with a bwa-picard pipeline, and picard MarkDuplicates used 7.8GB (I requested 8GB).

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F133%23issuecomment-833592125&data=04%7C01%7Cbolosky%40microsoft.com%7Cf420d604d9c542f4806808d9109f8870%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637559099640070369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HIYAp31zcq65ZRspiZUZXOXZ%2F77VtArFhnsDBxqhoOg%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHPTWIIAMQN4BFKLRMH5XDTMKVETANCNFSM44FVEPQQ&data=04%7C01%7Cbolosky%40microsoft.com%7Cf420d604d9c542f4806808d9109f8870%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637559099640080362%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Qo4QmuYihPEdcsZmjK997U013Sp7rhFtCxp3pa%2BDekM%3D&reserved=0.

teerjk commented 3 years ago

While I cannot share the initial example, I've identified a public data file pair that behaves similarly: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR359/SRR359295/SRR359295_1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR359/SRR359295/SRR359295_2.fastq.gz

These files are of similar size and duplicate rate, and I see the same increasing memory usage. I did watch the memory usage closely, and note that much of the usage once it gets to the sorting step is actually cached. However, top and htop still report that the snap-aligner processes include that memory as "RES": ~ 148GB on a 165GB node using 16 threads and -sm 2. My concern is that other processes cannot access this memory on a multi-user system. Of course, I can solve the problem by grabbing entire nodes, but that brings its own issues. Thanks.

arun-sub commented 3 years ago

Thanks for sharing the test data. I will look at the issue and try to reproduce what you are seeing.

--Arun

teerjk commented 3 years ago

I've been watching some runs closely and have noticed that during the sort, "RES" memory as reported by top climbs to almost maximum physical on the node. However, if I look at usage using "free", most of the used memory is actually cached. the "+/- buffers/cache:" line then shows only ~20-30GB used, with the rest being free (cached). If I start a different big memory job (samtools sort), snap's memory usage (RES) goes down as samtools increases. Interestingly, once snap is finished, I can observe samtools memory usage alone using free: it is all "used", and very little "cached". So, I think snap's usage is just behaving differently than I'm used to seeing.

teerjk commented 3 years ago

One more separate but related issue. I'm testing the serial mode where you give multiple samples on the command line and snap processes them one at a time. I am testing on a set of 192 smaller pairs (10M reads), and once it gets to pair 60 or so, the alignment slows WAY down. It starts at around 20-30 sec per pair, but then the last few pairs increase: 50-100 sec, 300, 2,460, 7,447, and then 15,330 sec. It's been working on the last file for more than 12 hours with little progress. Interestingly, memory usage is again high, but this time 'free' reports almost all the memory is "used" - there is little cached this time around. Snap's CPU usage is much lower (60-80%, when usually on a 16 core machine it uses 1600%), and there are a number of system io process also using CPU, suggesting system IO. Perhaps a memory issue with multiple serial samples?

bolosky commented 3 years ago

That sounds like it's running out of memory and going into paging, which points to a memory leak. It's possible that there's something leaking when you do multiple samples on the same run, we haven't tested it as extensively as with single runs. I'm back from vacation now, I'll see if I can figure out what's going on.

From: teerjk @.> Sent: Monday, May 17, 2021 4:31 AM To: amplab/snap @.> Cc: Bill Bolosky @.>; Comment @.> Subject: Re: [amplab/snap] controlling memory usage for sort/index/markdup (#133)

One more separate but related issue. I'm testing the serial mode where you give multiple samples on the command line and snap processes them one at a time. I am testing on a set of 192 smaller pairs (10M reads), and once it gets to pair 60 or so, the alignment slows WAY down. It starts at around 20-30 sec per pair, but then the last few pairs increase: 50-100 sec, 300, 2,460, 7,447, and then 15,330 sec. It's been working on the last file for more than 12 hours with little progress. Interestingly, memory usage is again high, but this time 'free' reports almost all the memory is "used" - there is little cached this time around. Snap's CPU usage is much lower (60-80%, when usually on a 16 core machine it uses 1600%), and there are a number of system io process also using CPU, suggesting system IO. Perhaps a memory issue with multiple serial samples?

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F133%23issuecomment-842249077&data=04%7C01%7Cbolosky%40microsoft.com%7C880e8a3f6c824d210b8c08d919274b2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637568478817416069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2FiQvyfh99rrhACeAKr%2Fn7ytnNd1vR19b29H%2BiMqdCa0%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHPTWKQNTKE22Q7CUEUKADTOD5ARANCNFSM44FVEPQQ&data=04%7C01%7Cbolosky%40microsoft.com%7C880e8a3f6c824d210b8c08d919274b2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637568478817416069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WMoY7%2Be0vWT4ktkFMOeqo3pDdR8usC1eV%2FmKj6P2F3o%3D&reserved=0.

bolosky commented 3 years ago

Hi,

Sorry to take to long to get back to you on this. It turned out to be more work than I expected.

We made two changes to SNAP. First is that we changed the code on linux to use aio to read in the aligned but unsorted reads rather than mapping them. The large memory use during sort that you were seeing was because the mapped file was taking up too much memory and the system virtual memory manager didn't do a good job with deciding what wasn't really needed. It had always worked this way on Windows (where most of the development is done) which is why we hadn't noticed it before.

Also, I went through and looked for memory leaks that happen when you run multiple alignments (using commas or daemon mode). It turned out that we did a really terrible job of cleaning up memory, since at the time that that code was written we expected only one alignment to run per process execution of SNAP. (Comma and daemon got added later.) I fixed all of the BigAlloc leaks that my tests were showing, though it's possible that using different code paths might expose others. We're also certainly leaking small things like C++ objects and strings, but on the scale of memory use of SNAP I doubt that that will ever matter for anyone.

I haven't released the fixed version yet, I'd like you to run it and see if it solves your problems before I do. So, to get the new code you need to run what's in the dev branch. Just do "git checkout dev" and then "make clean" and "make" and you should get a binary with version number 1.0dev.104. (If you're not a person who builds from source, let me know and I'll arrange to get you a binary to run.)

Thanks for sending in this bug report.

--Bill

From: teerjk @.> Sent: Monday, May 17, 2021 4:31 AM To: amplab/snap @.> Cc: Bill Bolosky @.>; Comment @.> Subject: Re: [amplab/snap] controlling memory usage for sort/index/markdup (#133)

One more separate but related issue. I'm testing the serial mode where you give multiple samples on the command line and snap processes them one at a time. I am testing on a set of 192 smaller pairs (10M reads), and once it gets to pair 60 or so, the alignment slows WAY down. It starts at around 20-30 sec per pair, but then the last few pairs increase: 50-100 sec, 300, 2,460, 7,447, and then 15,330 sec. It's been working on the last file for more than 12 hours with little progress. Interestingly, memory usage is again high, but this time 'free' reports almost all the memory is "used" - there is little cached this time around. Snap's CPU usage is much lower (60-80%, when usually on a 16 core machine it uses 1600%), and there are a number of system io process also using CPU, suggesting system IO. Perhaps a memory issue with multiple serial samples?

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F133%23issuecomment-842249077&data=04%7C01%7Cbolosky%40microsoft.com%7C880e8a3f6c824d210b8c08d919274b2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637568478817416069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2FiQvyfh99rrhACeAKr%2Fn7ytnNd1vR19b29H%2BiMqdCa0%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHPTWKQNTKE22Q7CUEUKADTOD5ARANCNFSM44FVEPQQ&data=04%7C01%7Cbolosky%40microsoft.com%7C880e8a3f6c824d210b8c08d919274b2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637568478817416069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WMoY7%2Be0vWT4ktkFMOeqo3pDdR8usC1eV%2FmKj6P2F3o%3D&reserved=0.

teerjk commented 3 years ago

I've been able to build and test 1.0dev.104, and can confirm that it seems to solve the memory control issue when running a single sample. Memory usage now behaves like I'm used to seeing on Linux. I'm testing multiple samples now and will report back. Thank you for your efforts on this!

teerjk commented 3 years ago

I can also confirm that memory issues running in serial sample mode is also fixed - all 192 of my lower read count test samples aligned in less than 1 minute. Thanks again!

bolosky commented 3 years ago

Fixed in 1.0.4.