PaddlePaddle / PaddleHelix

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集
Apache License 2.0
801 stars 189 forks source link

模型预测的蛋白质其中的每个氨基酸结果都特别大,如1028164807,要转换到对应的字母时发生list index out of range #239

Closed wcf653422590 closed 1 year ago

wcf653422590 commented 1 year ago

用的模型是helixfold-single/user_data/model_data/helixfold-single.pdparams

出错的代码在data_utils.py中,: def aatype_to_sequence(aatype):

return ''.join([
    residue_constants.restypes_with_x[aatype[i]] 
    for i in range(len(aatype))
])

Traceback (most recent call last): File "/mnt/workspace/helixfold-single_original/helixfold_single_inference.py", line 121, in main(args) File "/mnt/workspace/helixfold-single_original/helixfold_single_inference.py", line 103, in main args.fasta_file, af2_model_config) File "/mnt/workspace/helixfold-single_original/helixfold_single_inference.py", line 56, in sequence_to_batch sequence, description = read_fasta_file(fasta_file) File "/mnt/workspace/helixfold-single_original/helixfold_single_inference.py", line 42, in read_fasta_file with open(fasta_file, 'r') as f: TypeError: expected str, bytes or os.PathLike object, not NoneType (base) /mnt/workspace> /home/pai/bin/python /mnt/workspace/helixfold-single_original/helixfold_single_inference.py /home/pai/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp /home/pai/lib/python3.6/site-packages/OpenSSL/crypto.py:8: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release. from cryptography import utils, x509 W1202 15:20:51.727890 26523 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.4, Runtime API Version: 10.2 W1202 15:20:51.730804 26523 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6. [RunTapeModel] freeze_tape: False model size: 1187148024 Load model from helixfold-single/user_data/model_data/helixfold-single.pdparams 2022-12-02 15:21:01.499896: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "/mnt/workspace/helixfold-single_original/helixfold_single_inference.py", line 121, in main(args) File "/mnt/workspace/helixfold-single_original/helixfold_single_inference.py", line 106, in main results = model(batch, compute_loss=False) File "/home/pai/lib/python3.6/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, **kwargs) File "/mnt/workspace/helixfold-single_original/utils/model_tape.py", line 115, in forward batch = self._forward_tape(batch) File "/mnt/workspace/helixfold-single_original/utils/model_tape.py", line 95, in _forward_tape tape_input = self._create_tape_input(batch) File "/mnt/workspace/helixfold-single_original/utils/model_tape.py", line 80, in _create_tape_input text = aatype_to_sequence(aatype[:seq_len]) File "/mnt/workspace/helixfold-single_original/alphafold_paddle/data/data_utils.py", line 96, in aatype_to_sequence for i in range(len(aatype)) File "/mnt/workspace/helixfold-single_original/alphafold_paddle/data/data_utils.py", line 96, in for i in range(len(aatype))

SuperXiang commented 1 year ago

Hi,能提供下输入的蛋白数据以及运行环境么?

wcf653422590 commented 1 year ago

Hi,能提供下输入的蛋白数据以及运行环境么?

谢谢回复!!

蛋白质数据:

1BQL_1|Chain A[auth L]|HYHEL-5 FAB (LIGHT CHAIN)|Mus musculus (10090) DIVLTQSPAIMSASPGEKVTMTCSASSSVNYMYWYQQKSGTSPKRWIYDTSKLASGVPVRFSGSGSGTSYSLTISSMETEDAATYYCQQWGRNPTFGGGTKLEIKRADAAPTVSIFPPSSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSWTDQDSKDSTYSMSSTLTLTKDEYERHNSYTCEATHKTSTSPIVKSFNRNEC cuda: 10.1 cudnn: 7.6.5 Python 3.6.12 nccl: 2.15.5 显卡:Nvidia A100

absl==0.0 absl_py==0.13.0 Bio==1.5.2 dm_tree==0.1.6 ml_collections==0.1.1 numpy==1.19.5 paddle==1.0.2 paddlepaddle_gpu==0.0.0.post102 pandas==1.1.5 scipy==1.5.3 simtk==0.1.0 tensorflow==2.11.0 tensorflow_cpu==2.6.2 tree==0.2.4

SuperXiang commented 1 year ago

你先试试在蛋白数据第一行最前面加上'>'符号再运行,如果不行的话建议将paddlepaddle_gpu和cuda版本对齐,你这里cuda是10.1,paddle版本是10.2,可以参考这里选择合适的paddle包,建议最好根据文档的环境要求安装。

wcf653422590 commented 1 year ago

你先试试在蛋白数据第一行最前面加上'>'符号再运行,如果不行的话建议将paddlepaddle_gpu和cuda版本对齐,你这里cuda是10.1,paddle版本是10.2,可以参考这里选择合适的paddle包,建议最好根据文档的环境要求安装。

谢谢~ '>'符号其实数据上是有的,可能我复制过来,这个编辑器当做格式符号啦。我刚刚将paddlepaddle_gpu和cuda版本对齐了,但是还是出现同样的问题。 我如果运行参数上不改用helixfold-single.pdparams模型,而是直接运行代码默认设置,是可以运行出结果的。但是不用helixfold-single.pdparams肯定是效果不好的对吧。 我想了解下,改用helixfold-single.pdparams模型,代码里有什么配置需要改动吗?

SuperXiang commented 1 year ago

你这里说的不用helixfold-single.pdparams模型参数,那是用什么模型参数跑helixfold-single呢?

wcf653422590 commented 1 year ago

helixfold_single_inference.py中 代码默认的下面这一行: parser.add_argument("--init_model", type=str, help='tape + af2 stacked model') 按照这行直接运行是可以出结果的。

我按照咱们readme的提示,改为: parser.add_argument("--init_model", type=str, help='tape + af2 stacked model', default='../user_data/model_data/helixfold-single.pdparams')

wcf653422590 commented 1 year ago

你这里说的不用helixfold-single.pdparams模型参数,那是用什么模型参数跑helixfold-single呢?

它应该是有默认的模型在运行,我还没太熟悉代码,不知道它默认使用的是哪个模型

SuperXiang commented 1 year ago

了解了,你这里应该是随机初始化的模型参数在跑的。建议你这边再重新根据文档里的特定环境要求(Python 3.7+,对应的cuda和paddle版本等)重新安装部署一下环境试试。

wcf653422590 commented 1 year ago

了解了,你这里应该是随机初始化的模型参数在跑的。建议你这边再重新根据文档里的特定环境要求(Python 3.7+,对应的cuda和paddle版本等)重新安装部署一下环境试试。

好的,谢谢,我也正在尝试重装各种系统,等所有环境都一致我再试试。

SuperXiang commented 1 year ago

我先关闭issue了,如果后续有什么问题,可以随时打开