apcamargo / pycoverm

Simple Python interface to CoverM's fast coverage estimation functions
GNU General Public License v3.0
7 stars 2 forks source link

Not all threads used #3

Closed jakobnissen closed 2 years ago

jakobnissen commented 3 years ago

When using CoverM from command line on 10 BAM files, roughly 5 CPUs are used. But when using pycoverm, only about 2 CPUs are used. In both cases, 8 CPUs were given to the task.

apcamargo commented 3 years ago

This is something I need to investigate further. Because all the parallelization is performed at the Rust code, I assumed that Python wouldn't interfere with it.

Do you have a reproducible example? How did you measure the CPU usage?

jakobnissen commented 3 years ago

Hm, this is strange. I used multi-gigabyte BAM files from the CAMI2 Airways dataset, and here, it uses around 225% CPU. To make a reproducible example, I downloaded some smaller public BAM files, where it utilized 750% CPU. Again, when using CoverM directly on the CAMI2 files, it uses 500-600% CPU.

One difference could be the number of BAM headers - around 185,000 in the CAMI2 files, around 20 in the public ones. Could it be that most of the time is actually spend in pycoverm, not in CoverM?

This is a wild guess, but perhaps pycoverm performs more checks on each read than CoverM? E.g. I can see pycoverm passes filter_params.min_percent_identity_pair = 0 to the internal Rust function. Does this mean that the statistic (and several other similar statistics) is needlessly computed for each read, then compared to zero?

I'm not sure how to troubleshoot this. Do you have an idea about how I could dig into what is happening?

apcamargo commented 3 years ago

I don't think that the min_percent_identity_pair is the problem because CoverM uses it in all (can't check that right now, but I'm almost sure) modes. An additional filter that I'm using though is the trimmed means function, because it can be used as the standard mean estimation when you set the trimming parameters to 0. But I don't think that this is the problem (I would have to test to be sure).

You mentioned that the number of headers is vastly different. So maybe these loops are the ones slowing down the process. If that's the case, they can be easily parallelized with Rayon. I'll try to check that later this week.

jakobnissen commented 3 years ago

I thought about those loops too, but if they were the problem, I would expect CPU usage to begin at around 5-600%, then drop to 100%, when in actuality, it's uniformly around 225%.

apcamargo commented 3 years ago

Humn. Could you test to see if the trimmed means method in CoverM is slower than the standard mean in the CAMI2 BAM? You could set the trimming parameters to 0 to make sure that the results will be the same as the standard mean.

jakobnissen commented 3 years ago

I have been comparing coverm contig -m trimmed_mean --trim-min 10 --trim-max 90 -t 8 --bam-files * (from shell) with _pycoverm.get_coverages_from_bam(paths, threads=8, min_identity=0.0, trim_upper=0.1, trim_lower=0.1) (from Python) with the same files.

apcamargo commented 3 years ago

I'll have to put some though into this then. Unfortunately I don't think that there's a easy way to profile a Rust function within Python.

Just by curiosity. Did you perform the benchmark with the version I released today? The Linux wheels were compiled in a different Manylinux container and I used a flag to improve performance.

jakobnissen commented 3 years ago

Just tried with the new version, same. Interestingly, it does actually spawn 8 additional threads. I'm on MacOS, but I don't expect that to make a difference, right?

apcamargo commented 3 years ago

Nothing changed for building the macOS wheels. I did some testing here and it seems to be working fine.

As soon as I got some time I'll investigate this issue further. Maybe it has something to do with the FFI, even though I can't see how. I'll at least try to parallelize that loop with rayon.

apcamargo commented 3 years ago

Well, I started to investigate this issue again and I couldn't figure out what is going on. I wrote some test code where the parallelization is all done on the Rust side and it looks like Python won't interfere with it. Do you have any idea what might be going on, @wwood?

@jakobnissen I'll try to reproduce the issue here. Any of the CAMI2 Airways BAM files will serve?

jakobnissen commented 3 years ago

Hm, not sure. I could only download the short-read Cami2 toy human microbiome dataset, Airways, due to brandwidth reasons.

jakobnissen commented 2 years ago

I can't reproduce this issue anymore. No idea what caused it. Closing as solved - we can always re-open if it appears again