chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
517 stars 85 forks source link

Hifiasm only using 1 thread #129

Open RNieuwenhuis opened 3 years ago

RNieuwenhuis commented 3 years ago

Hi,

I have used hifiasm successfully several times in the past. However lately I am running into some strange behaviour. The subject I am trying to assemble is half the genome size of Californian Redwood genome but also polyploid. You managed to assemble that on a 80 cpu machine in ~4500 cpu hours with a memory footprint between 500-700 gb IIRC.

Now in my case I did a first trial using 0.14-r312 with barely enough coverage per haplotype (6X). This resulted in a not so clear peak in the k-mer plot and hifiasm consuming over 1.5 Tb of RAM. The job had a nice steady load in parallel though. I killed this job because the memory footprint and runtime were too high. I assumed this was due to hifiasm trying to correct repeats with the weird thresholds obtained from the botched k-mer counts histogram, caused by too low coverage.

Currently, after adding more data to ~14x per haplotype, I am running 2 hifiasm processes simultaneously on different machines.

Run A: 0.14-r312 hifiasm -t 64 -k 63 -o my_prefix *.fastq.gz Run_A.log

Run B: 0.15.1-r334 hifiasm -t 128 -o my_prefix *.fastq.gz Run_B.log

Somehow both runs only consume 1 thread for 4 days already. The memory footprint is very low < 300 gb. I have not observed this before, usually hifiasm runs at a very steady load around the number of threads given.

Do you have any idea what is going on? The k-mer plots don't look off (although they aren't showing clearly outstanding peaks either) and it seems to detect the peaks correctly. Do you maybe have the k-mer plot for the redwood assembly somewhere so I can compare

chhylp123 commented 3 years ago

0.14-r312 and 0.15.1-r334 share the exactly same error correction part so that it is no need to run both of them. From the k-mer plots it seems there are ~6x HiFi reads per haplotype, probably it is not enough for hifiasm. Is it possible that you can at least double it?

chhylp123 commented 3 years ago

I guess 14x HiFi reads per haplotype may take less memory. The huge RAM requirement for now probably caused by misidentification of k-mer peak.

RNieuwenhuis commented 3 years ago

0.14-r312 and 0.15.1-r334 share the exactly same error correction part so that it is no need to run both of them. From the k-mer plots it seems there are ~6x HiFi reads per haplotype, probably it is not enough for hifiasm. Is it possible that you can at least double it?

We ran them simultaneously with a different k to check the effect on the memory footprint. But good to know nonetheless.

I guess 14x HiFi reads per haplotype may take less memory. The huge RAM requirement for now probably caused by misidentification of k-mer peak.

I cannot explain the peak at 6X in the k-mer graph. Based on the genome size (ploidy * 1C value), and our total amount of sequencing data, the peak should be at ~14X per haplotype. Perhaps my calculation is off.

We might have a miscommunication on the memory footprint.

Somehow both runs only consume 1 thread for 4 days already. The memory footprint is very low < 300 gb. I have not observed this before, usually hifiasm runs at a very steady load around the number of threads given.

Do you have any idea what causes this? Furthermore, we are not interested in haplotigs in this case, but would be very happy with a collapsed assembly. Do you have any suggestions for parameters to achieve this? Your recommendations would be highly appreciated.

chhylp123 commented 3 years ago

The main issue is why the k-mer plot is like that. A good HiFi sample should have a plot like: https://github.com/chhylp123/hifiasm/issues/49#issue-729106823. Generally even we can get some results with such weird k-mer pot, the assembly should be not good. I would still recommend to first figure out why the HiFi reads are weird.

hhayleyj commented 2 years ago

I am bringing this back up because I am having a similar question that was not answered here.

I am running the following command: hifiasm -o genome -t48 --h1 genome_R1.fastq.gz --h2 genome_R2.fastq.gz genome.fastq.gz

For some reason, it is only using one thread? I suspect it will take a very, very long time to run if it does not multithread.

lh3 commented 2 years ago

@hhayleyj Some stages only use one thread.

philippbayer commented 1 year ago

I'm having a similar issue with hifiasm v0.18.5 and v0.19.0 with a high-coverage read set.

It runs out of walltime after 24 hours, never using more than one CPU or more than 30GB of memory.

[M::ha_pt_gen::63393.803*1.00] ==> indexed 785841681 positions, counted 36763082 distinct minimizer k-mers
slurmstepd: error: *** STEP 1021467.0 ON nid001505 CANCELLED AT 2023-03-16T14:09:44 DUE TO TIME LIMIT ***

seff output for the last run I tried:

State: TIMEOUT (exit code 0)
Nodes: 1
Cores per node: 128
CPU Utilized: 1-00:00:18
CPU Efficiency: 0.78% of 128-00:02:08 core-walltime
Job Wall-clock time: 1-00:00:01
Memory Utilized: 33.07 GB
Memory Efficiency: 16.53% of 200.00 GB

It's 104 GB of HiFi reads, removing reads below 2kb in length, for a fish genome approximately 1GB in size (so yeah, about 100x!). The histogram looks OK.

I guess downsampling to about 30x while retaining only the longest reads would be a workaround?

Here's a log with -l 1 -purge-max 114 (same thing happens without setting these parameters)

err.log

I've tried conda-installed hifiasm and my self-compiled hifiasm v0.18.5 and v0.19.0 via gcc v12.1.0

chhylp123 commented 1 year ago

Thanks. But based on the log file you are showing, the coverage of input data is around 30x, and hifiasm has already done 2 rounds of correction, so it should already utilize multiple CPUs [M::ha_assemble::61214.666*1.00@33.069GB] ==> corrected reads for round 2. Probably for this run, hifiasm just needs a for a few more. moment

philippbayer commented 1 year ago

Thanks for getting back to me!

This line is maybe a bit confusing and I was imprecise:

Job Wall-clock time: 1-00:00:01

What it's saying is that it has already run for 24 hours and got killed by the HPC maximum walltime limit. If it's only 30x with about a 1GB genome it shouldn't take this long, I believe?

There might be an issue with hifiasm talking with SLURM, I'm investigating. It clearly is not using all the CPUs it's getting allocated by SLURM as it uses only one CPU the whole time, and it might be due to SLURM.

philippbayer commented 1 year ago

I can confirm that it's a SLURM issue for me! srun hifiasm will take 24 hours and use only 1 CPU, without srun it will take about 90 minutes and use half of the requested CPUs (on average). I'm talking to the HPC maintainers to see whether that's a general thing; it does not seem to happen with other software.

chhylp123 commented 1 year ago

I see, thanks a lot~

nickgladman commented 2 months ago

@philippbayer did you ever find out if the srun issue was a general matter on SLURM? Thanks.

philippbayer commented 2 months ago

@nickgladman yes it was just me using srun in SLURM the wrong way. I generally do something like srun -N $SLURM_JOB_NUM_NODES -n $SLURM_NTASKS -c $OMP_NUM_THREADS -m block:block:block hifiasm these days so it doesn't happen again!