DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
120 stars 40 forks source link

Performance issue #21

Closed boris-milicevic closed 4 years ago

boris-milicevic commented 5 years ago

Hello, I am part of Bioinformatics team in Seven Bridges Genomics and we are interested in porting HLA-LA algorithm to our platform. We would appreciate your assistance with some of the issues we encountered.

So far, we have created proper environment via Docker and I successfully ran the algorithm on the provided test .cram file. Nevertheless we are concerned about execution time. The algorithm was ran on a AWS machine: r4.4xlarge with 16 CPU and 122 GB RAM Execution lasted for 2 hours, and we've set the maxThreads parameter to 16. While looking into execution details we found out that our AWS instance used only 1 CPU.

Would you be so kind and help us understand what could be the reason of this multithreading failure? Is there some module or package we are missing that could help us achieve the full performance?

We are providing you a log file that came out of our AWS instance. Thank you in advance.

job.err.log

AlexanderDilthey commented 5 years ago

Hi @boris-milicevic, the integrated bwa and samtools steps should make proper use of multithreading (pull the most recent version from GitHub if they don't), but in our experiments the multithreading behavior of the internal linear alignment projection step wasn't great, which is why we deactivated multithreading for this component. 2 hours sounds about right - I think it may be more efficient to request an instance with less memory and fewer CPUs.