Unusually high memory memory usage with gene_quanti

kmuench commented 1 year ago

Hello,

I noticed that gene_quanti seems to be using a strangely high amount of memory.

The command I'm running is: reademption gene_quanti -p 4 --features 'gene,cds,region,exon' --project_path READemption_project

I'm using this container: container "tillsauerwein/reademption:2.0.2"

The command runs perfectly fine on the tutorial data (fastq file size 1.5M-1.6M, bam file size), with a standard memory footprint. However, I'm now using a second set of test data with fastq files around 1.8-2.6GB. The reference genomes and gff files are much smaller. Yet, this step keeps crashing due to out of memory errors - so far 200 GB hasn't been enough for it. The alignment ran fine; I'm not sure why this step would be such a memory hog. I'm currently running htseq-count and featureCounts via command line to see how these perform in comparison.

Is this expected behavior? Could this be the result of a bug?

konrad commented 1 year ago

Thanks for reporting this, @kmuench. @Tillsa can have a closer look at it in a week. You could try to run --no_count_split_by_alignment_no and/or --no_count_splitting_by_gene_no to reduce the required memory in the meantime.

@Tillsa - this could maybe also further explained in the documentation.

kmuench commented 1 year ago

Hello! If it helps, I reran gene_quanti with --unique_only and this time it did complete in about 430 minutes using 37.5G Memory, CPU usage was about half of READemption Align.

Tillsa commented 1 year ago

We aim to reduce memory consumption in the future. In the meantime there is no other solution than increasing memory. I had a data set with 15 libraries with about 30 million reads each and needed around 400 G Memory. If you need a machine with more power you could try out de.NBI cloud

termithorbor commented 7 months ago

Is it normal, that the command runs for several days on 12 paired and samples with fastq files around 12 Gb for each forward and reverse reads on a server with 70 CPUs?

Tillsa commented 7 months ago

yes, it is possible. I implemented printing out timestamps of intermediate steps. If there are still timestamps and intermediate steps added from time to time everything runs as intended. You can also post the output of the current command and I have a look at it.

termithorbor commented 7 months ago

So far only empty gene_quant folders but no further data was produced. In the terminal no timestaps/intermediate steps are shown:

It is already running for ~5 days.

Tillsa commented 7 months ago

Which READemption version are you using? Can you check your memory usage? Do you still have memory left?

termithorbor commented 7 months ago

I am running version 2.0.4:

watch -n 5 free -m Every 5,0s: free -m dil-sequenz: Wed Apr 10 13:55:36 2024

          total        used        free      shared  buff/cache   available

Mem: 806288 16460 67087 13 722740 784735 Swap: 8191 9 8182

Tillsa commented 7 months ago

all right, there is still some memory left. I would give it a couple more days.

termithorbor commented 7 months ago

Okay, however it is already running for 8 days. Is this still normal behaviour?

termithorbor commented 7 months ago

Still running...I fear it is stuck somewhere

I have restarted it now with the --no_count_split_by_alignment_no option. Or is using --no_count_splitting_by_gene_no as well a better option?

termithorbor commented 7 months ago

Hi,

Which --processes makes sense to run with gene_quant. I have given 140 to the job and I can see that for my 24 samples only 24 processes are really used. So number of processes = sample size?

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND ngs-lite 4307 0.0 0.0 8404 5504 pts/0 Ss Apr15 0:00 -bash ngs-lite 77984 0.6 0.4 13220052 3903884 pts/0 Sl+ Apr16 77:20 /home/ngs-lit ngs-lite 81948 99.9 0.4 13288296 4049504 pts/0 R+ Apr16 11388:42 /home/ngs-l ngs-lite 81949 98.1 0.4 13059796 3821752 pts/0 S+ Apr16 11181:40 /home/ngs-l ngs-lite 81950 21.3 0.4 13058260 3820700 pts/0 S+ Apr16 2434:40 /home/ngs-li ngs-lite 81951 75.5 0.4 13059796 3821956 pts/0 S+ Apr16 8602:57 /home/ngs-li ngs-lite 81952 73.8 0.4 13059284 3821840 pts/0 S+ Apr16 8413:32 /home/ngs-li ngs-lite 81953 54.8 0.4 13059796 3821624 pts/0 S+ Apr16 6251:26 /home/ngs-li ngs-lite 81954 79.9 0.4 13060308 3822192 pts/0 S+ Apr16 9109:03 /home/ngs-li ngs-lite 81955 99.9 0.4 13235220 3996428 pts/0 R+ Apr16 11388:50 /home/ngs-l ngs-lite 81956 46.5 0.4 13059284 3820996 pts/0 S+ Apr16 5306:46 /home/ngs-li ngs-lite 81957 38.1 0.4 13066332 3828992 pts/0 S+ Apr16 4346:46 /home/ngs-li ngs-lite 81958 35.9 0.4 13058260 3820852 pts/0 S+ Apr16 4100:10 /home/ngs-li ngs-lite 81959 28.0 0.4 13058260 3820772 pts/0 S+ Apr16 3191:24 /home/ngs-li ngs-lite 81960 99.9 0.4 13295236 4052228 pts/0 R+ Apr16 11388:56 /home/ngs-l ngs-lite 81961 56.4 0.4 13059796 3821772 pts/0 S+ Apr16 6428:41 /home/ngs-li ngs-lite 81962 16.0 0.4 13058260 3820716 pts/0 S+ Apr16 1832:14 /home/ngs-li ngs-lite 81963 99.9 0.4 13241776 3997440 pts/0 R+ Apr16 11388:29 /home/ngs-l ngs-lite 81964 68.5 0.4 13059284 3821856 pts/0 S+ Apr16 7801:45 /home/ngs-li ngs-lite 81965 60.9 0.4 13059796 3821636 pts/0 S+ Apr16 6943:02 /home/ngs-li ngs-lite 81966 99.9 0.4 13232144 3989008 pts/0 R+ Apr16 11388:28 /home/ngs-l ngs-lite 81967 99.9 0.4 13239376 4000760 pts/0 R+ Apr16 11388:26 /home/ngs-l ngs-lite 81968 51.0 0.4 13059284 3821012 pts/0 S+ Apr16 5808:41 /home/ngs-li ngs-lite 81969 60.2 0.4 13066332 3829012 pts/0 S+ Apr16 6865:57 /home/ngs-li ngs-lite 81970 23.6 0.4 13109820 3872104 pts/0 S+ Apr16 2695:36 /home/ngs-li ngs-lite 81971 44.6 0.4 13058260 3820780 pts/0 S+ Apr16 5085:16 /home/ngs-li ngs-lite 81972 0.0 0.4 13088980 3848268 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81973 0.0 0.4 13088980 3848268 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81974 0.0 0.4 13088980 3848268 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81975 0.0 0.4 13088980 3848268 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81976 0.0 0.4 13088980 3848268 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81977 0.0 0.4 13088980 3848268 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81978 0.0 0.4 13088980 3848268 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81979 0.0 0.4 13088980 3848276 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81980 0.0 0.4 13088980 3848276 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81981 0.0 0.4 13088980 3848276 pts/0 S+ Apr16 0:00 /home/ngs-lit ngs-lite 81982 0.0 0.4 13088980 3848276 pts/0 S+ Apr16 0:00 /home/ngs-lit and a few more lines till line 142 is reached without any activity.

Thanks in advance.

termithorbor commented 7 months ago

Is it somehow possible to give more processes to Reademption? because at the moment only 20% of our total CPU capacity is used.

Thanks in advance :)

Tillsa commented 7 months ago

exactly, the number of maximum parallel processes is the number of maximum number of samples. Is the gene quantification step still running?

termithorbor commented 7 months ago

Hi, yes the step is still running since I started it on the 16.04.2024 with the --no_count_split_by_alignment_no --no_count_splitting_by_gene_no option. It is a paired end analysis with two different bacteria species and 12 samples. Since there are two different species 24 for processes are used, right? Do you have any suggestions how to speed up the analysis? Thanks in advance.

foerstner-lab / READemption

Unusually high memory memory usage with gene_quanti #56