PaddlePaddle / PaddleHelix

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集
Other
1.02k stars 225 forks source link

HelixFold3: Can't solve the UnpicklingError issue #346

Closed YoshitakaMo closed 2 months ago

YoshitakaMo commented 2 months ago

I'd like to test HelixFold3, but I encountered this issue during the inference:

  input_embedder:
    atom_encoder:
      atom_transformer:
        diffusion_transformer:
          a_channel_name: atom_channel
          n_block: 3
          n_head: 4
          s_channel_name: atom_channel
          z_channel_name: atom_pair_channel
        n_key: 128
        n_query: 32
      in_token_channel_name: token_channel
      out_token_channel_name: token_channel
      use_dense_mode: true
    relative_position_encoding:
      relative_chain_max: 2
      relative_token_max: 32
  num_recycle: 3
  resample_msa_in_recycling: true

W0908 01:42:33.273294 2256060 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.4, Runtime API Version: 12.0
W0908 01:42:33.324625 2256060 gpu_resources.cc:149] device: 0, cuDNN Version: 8.7.
Load pretrain model from /mnt/database/helixfold3/HelixFold3-params-240814/HelixFold3-240814.pdparams
============ Data Loading ============
Traceback (most recent call last):
  File "/home/apps/PaddleHelix/apps/protein_folding/helixfold3/inference.py", line 637, in <module>
    main(args)
  File "/home/apps/PaddleHelix/apps/protein_folding/helixfold3/inference.py", line 496, in main
    feature_dict = feature_processing_aa.process_input_json(
  File "/home/apps/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 398, in process_input_json
    ccd_preprocessed_dict = load_ccd_dict(ccd_preprocessed_path)
  File "/home/apps/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 43, in load_ccd_dict
    ccd_preprocessed_dict = pickle.load(fp)
_pickle.UnpicklingError: unpickling stack underflow

If anyone could solve this issue, I would be very grateful.


My environment:

Installation procedure:

INSTALLDIR=/home/apps/

mkdir -p ${INSTALLDIR}
cd ${INSTALLDIR}
git clone https://github.com/PaddlePaddle/PaddleHelix.git

HELIXFOLD3DIR=${INSTALLDIR}/PaddleHelix/apps/protein_folding/helixfold3
cd ${HELIXFOLD3DIR}
wget -q -P . https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p ${HELIXFOLD3DIR}/conda
rm Miniconda3-latest-Linux-x86_64.sh

. "${HELIXFOLD3DIR}/conda/etc/profile.d/conda.sh"
export PATH="${HELIXFOLD3DIR}/conda/condabin:${PATH}"
conda create -n helixfold -c conda-forge python=3.9 -y
conda install -y -c bioconda hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 -n helixfold
conda install -y -c conda-forge openbabel -n helixfold

conda activate helixfold
python3.9 -m pip install paddlepaddle-gpu==2.5.2.post120 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

python3.9 -m pip install -r requirements.txt

Script:

python3.9 ${HELIXFOLD3DIR}/inference.py \
    --maxit_binary "/home/linuxbrew/.linuxbrew/opt/maxit/bin/maxit" \
    --jackhmmer_binary_path "$ENV_BIN/jackhmmer" \
    --hhblits_binary_path "$ENV_BIN/hhblits" \
    --hhsearch_binary_path "$ENV_BIN/hhsearch" \
    --kalign_binary_path "$ENV_BIN/kalign" \
    --hmmsearch_binary_path "$ENV_BIN/hmmsearch" \
    --hmmbuild_binary_path "$ENV_BIN/hmmbuild" \
    --nhmmer_binary_path "$ENV_BIN/nhmmer" \
    --preset='full_dbs' \
    --bfd_database_path "$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt" \
    --uniclust30_database_path "$DATA_DIR/UniRef30_2023_02/UniRef30_2023_02" \
    --uniprot_database_path "$DATA_DIR/uniprot/uniprot.fasta" \
    --pdb_seqres_database_path "$DATA_DIR/pdb_seqres/pdb_seqres.txt" \
    --uniref90_database_path "$DATA_DIR/uniref90/uniref90.fasta" \
    --mgnify_database_path "$DATA_DIR/mgnify/mgy_clusters.fa" \
    --ccd_preprocessed_path "$DATA_DIR/helixfold3/ccd_preprocessed_etkdg.pkl.gz" \
    --rfam_database_path "$DATA_DIR/helixfold3/Rfam-14.9_rep_seq.fasta" \
    --template_mmcif_dir "$PDB_DIR/pdb_mmcif/mmcif_files" \
    --obsolete_pdbs_path "$PDB_DIR/pdb_mmcif/obsolete.dat" \
    --max_template_date=2020-05-14 \
    --input_json data/demo_6zcy.json \
    --output_dir demo_output/demo_6zcy \
    --model_name allatom_demo \
    --init_model $DATA_DIR/helixfold3/HelixFold3-params-240814/HelixFold3-240814.pdparams \
    --infer_times 3 \
    --precision "fp32"
YoshitakaMo commented 2 months ago

Re-downloading ccd_preprocessed_etkdg.pkl.gz (63650922 bytes) solved this issue. It seems that the problem was with my environment.