PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

falcon_kit.mains.rr_ctg_track causing a lot of paging #63

Closed msc51 closed 7 years ago

msc51 commented 7 years ago

Hi Developers,

I am at the stage of falcon_kit.mains.rr_ctg_track and I could see the file "3-unzip/reads/pwatcher.dir/stderr" was appended every 9 hours or so with messages like this: [27709]finished run_tr_stage1('/path/to/my/project/0-rawreads/m_00140/raw_reads.140.las', 2500, 40, dict(605867 elem)) [27711]maxrss: 50192328

I have been running it for 3 days and there are only 16 such "finished run_tr_stage1" line.

I went into the node and found that the falcon_unzip processes are using a lot of memory but not much CPU. And a lot of swap is in use. There are 4 "python -m falcon_kit.mains.rr_ctg_track" processes in parallel. The node is clearly busy doing paging.

My question is, is there a way for me to control no. of processes running in parallel on the same node (i.e. reduce number of processes actively paging)? Also, are these processes meant to be all running on the same node? Can I make falcon_kit.mains.rr_ctg_track run on multiple nodes to get things done faster?

Thanks!

pb-cdunn commented 7 years ago

is there a way for me to control no. of processes running in parallel on the same node

Grep the python code for concurr. You should be able to reduce one of those. In fact, the source code is pretty straight-forward, so you can see where the concurrency settings are used.

Please post here if you get something working.