FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data
BSD 3-Clause "New" or "Revised" License
180 stars 74 forks source link

Optitype memory error in v1.3.5 due to pandas' reindex #125

Open jnktsj opened 2 years ago

jnktsj commented 2 years ago

Hi,

Thank you for developing the great tool! I have been using Optitype v1.3.2 and recently switched to the latest version, v1.3.5. Interestingly, all the samples that were used to successfully finish with v1.3.2 started failing due to out of memory error in v1.3.5 (tested both docker image and code versions). I have increased memory from 16GB to 32GB, but no luck so far. The input FASTQs I used were only HLA-mapping reads from the pre-processing step, so these were only ~5-8MB depending on samples..

Highly likely this memory issue is from pandas reindex implemented in #102 (hlatype = result.iloc[0].reindex(["A1", "A2", "B1", "B2", "C1", "C2"]).drop_duplicates().dropna()). It would be great if a fix for this memory leak can be investigated further..

b-niu commented 2 years ago

Hi @jnktsj , in my own case, a server with 128GB RAM will also encounter this problem.

jnktsj commented 2 years ago

Hi @b-niu, my solution for now is not using the latest Optitype. Here is the code snippet of my setup for Optitype v1.3.2:

pip install \
  numpy==1.15.4 pandas==0.22.0 matplotlib==2.1.2 \
  pyomo==5.3 pysam==0.13 future==0.16.0 \
  numexpr==2.6.4 tables==3.4.2 pyutilib==5.8

# razers3 3.5.8
wget 'http://packages.seqan.de/razers3/razers3-3.5.8-Linux-x86_64.tar.xz' && \
    tar -xf razers3-3.5.8-Linux-x86_64.tar.xz && rm razers3-3.5.8-Linux-x86_64.tar.xz && \
    mv razers3-3.5.8-Linux-x86_64/bin/razers3 /usr/local/bin/razers3 && rm -r razers3-3.5.8-Linux-x86_64

# Optitype v1.3.2 (don't use v1.3.5 due to memory leak)
wget 'https://github.com/FRED-2/OptiType/archive/refs/tags/v1.3.2.tar.gz' && \
    tar xzf v1.3.2.tar.gz && rm v1.3.2.tar.gz && mv OptiType-1.3.2 /usr/local/bin/
b-niu commented 2 years ago

Thanks a lot @jnktsj ! Nice solution 😄