Problem in protein_process.py

Hello author,

It is impossible to create the protein embedding files because of an error while saving the file.

It seems the embeddings have different shapes and can not be saved together in a single file. Please, can you verify and indicate a solution?

File "/home/angeloduarte/AttentionMGT-DTA/protein_process.py", line 349, in Protein_embedding_process(dataset=dataset, fold=fold, id_train=protein_id_train, id_test=protein_id_test, dir_output=dir_output) File "/home/angeloduarte/AttentionMGT-DTA/protein_process.py", line 301, in Protein_embedding_process np.save(dir_output + '/train/fold/' + str(fold) + '/protein_embedding.npy', proteins_embedding_train, allow_pickle=True) File "<__array_function__ internals>", line 200, in save File "/home/angeloduarte/.pyenv/versions/mgtdta/lib/python3.9/site-packages/numpy/lib/npyio.py", line 521, in save arr = np.asanyarray(arr) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

Hello author,

It is impossible to create the protein embedding files because of an error while saving the file.

It seems the embeddings have different shapes and can not be saved together in a single file. Please, can you verify and indicate a solution?

File "/home/angeloduarte/AttentionMGT-DTA/protein_process.py", line 349, in Protein_embedding_process(dataset=dataset, fold=fold, id_train=protein_id_train, id_test=protein_id_test, dir_output=dir_output) File "/home/angeloduarte/AttentionMGT-DTA/protein_process.py", line 301, in Protein_embedding_process np.save(dir_output + '/train/fold/' + str(fold) + '/protein_embedding.npy', proteins_embedding_train, allow_pickle=True) File "<array_function internals>", line 200, in save File "/home/angeloduarte/.pyenv/versions/mgtdta/lib/python3.9/site-packages/numpy/lib/npyio.py", line 521, in save arr = np.asanyarray(arr) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

我认为是numpy版本的问题，我将numpy版本退化到"pip install numpy==1.23.0 "，成功解决这个问题，我用在alphafold数据库中使用uniportID找的的蛋白质，先将其的氨基酸three to one，变为从pdb变为fasta，再用使用esmfold的"python scripts/extract.py esm2_t33_650M_UR50D kiba_sequences.fasta embeddings_output --repr_layers 33 --include per_tok --truncation_seq_length 4096"，再将其变为npy格式，重命名为ESM_embedding，成功运行protein_process.py并且没有报错，但是生成的五折交叉检验的train和test非常巨大，居然占了150G的磁盘空间。 I think it is the numpy version problem, I degraded the numpy version to "pip install numpy==1.23.0 ", successfully solved this problem, I used the protein found by uniportID in alphafold database, first three to one of its amino acids, To change from pdb to fasta,

import sys
import os
import pickle
from argparse import ArgumentParser
from Bio.PDB import PDBParser
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from tqdm import tqdm
from Bio import SeqIO

three_to_one = {'ALA':  'A',
                'ARG':  'R',
                'ASN':  'N',
                'ASP':  'D',
                'CYS':  'C',
                'GLN':  'Q',
                'GLU':  'E',
                'GLY':  'G',
                'HIS':  'H',
                'ILE':  'I',
                'LEU':  'L',
                'LYS':  'K',
                'MET':  'M',
                'MSE':  'M', # MSE this is almost the same AA as MET. The sulfur is just replaced by Selen
                'PHE':  'F',
                'PRO':  'P',
                'PYL':  'O',
                'SER':  'S',
                'SEC':  'U',
                'THR':  'T',
                'TRP':  'W',
                'TYR':  'Y',
                'VAL':  'V',
                'ASX':  'B',
                'GLX':  'Z',
                'XAA':  'X',
                'XLE':  'J'}

parser = ArgumentParser()
parser.add_argument('--out_file', type=str, default="./KIBA_sequences.fasta")

parser.add_argument('--dataset', type=str, default="KIBA")
parser.add_argument('--data_dir', type=str, default='/root/sjb/chem/esm-main/pdb/KIBA/PDB_AF2', help='')
args = parser.parse_args()

biopython_parser = PDBParser()

def get_structure_from_file(file_path):
    structure = biopython_parser.get_structure('random_id', file_path)
    structure = structure[0]
    l = []
    for i, chain in enumerate(structure):
        seq = ''
        for res_idx, residue in enumerate(chain):
            if residue.get_resname() == 'HOH':
                continue
            residue_coords = []
            c_alpha, n, c = None, None, None
            for atom in residue:
                if atom.name == 'CA':
                    c_alpha = list(atom.get_vector())
                if atom.name == 'N':
                    n = list(atom.get_vector())
                if atom.name == 'C':
                    c = list(atom.get_vector())
            if c_alpha != None and n != None and c != None:  # only append residue if it is an amino acid
                try:
                    seq += three_to_one[residue.get_resname()]
                except Exception as e:
                    seq += '-'
                    print("encountered unknown AA: ", residue.get_resname(), ' in the complex ', file_path, '. Replacing it with a dash - .')
        l.append(seq)
    return l

data_dir = args.data_dir
names = os.listdir(data_dir)

if args.dataset == 'KIBA':
    sequences = []
    ids = []

    for name in tqdm(names):
        if name == '.DS_Store': continue
        rec_path = os.path.join(data_dir, name)
        l = get_structure_from_file(rec_path)
        for i, seq in enumerate(l):
            sequences.append(seq)
            ids.append(f'{name}_chain_{i}')
    records = []
    for (index, seq) in zip(ids, sequences):
        record = SeqRecord(Seq(seq), str(index))
        record.description = ''
        records.append(record)
    SeqIO.write(records, args.out_file, "fasta")

Then use esmfold's "python scripts/extract.py esm2_t33_650M_UR50D kiba_sequences.fasta embeddings_output --repr_layers 33 --include per_tok --truncation_seq_length 4096" and change it to npy format and rename it to ESM_embedding.

import torch
import numpy as np
import os
import glob

def convert_pt_to_npy(pt_file, npy_file):
    # 加载 .pt 文件
    data = torch.load(pt_file)

    # 检查并提取 'representations' 键下的数据
    if isinstance(data, dict) and 'representations' in data:
        representations = data['representations']

        # 确保 'representations' 是一个字典并包含所需层的张量
        if isinstance(representations, dict):
            for key, value in representations.items():
                if isinstance(value, torch.Tensor):
                    # 保存为 .npy 文件
                    np.save(npy_file, value.numpy())
                    print(f"Saved data from {pt_file} (key: {key}) to {npy_file}.")
                    return
        else:
            print(f"'representations' key does not contain a dictionary in {pt_file}.")
    else:
        print(f"'representations' key not found in {pt_file}.")

# 示例使用
pt_folder = '/root/sjb/chem/esm-main/pdb/KIBA/embeddings_output'
npy_folder = '/root/sjb/chem/esm-main/pdb/KIBA/ESM_embedding'

# 确保 npy 文件夹存在
if not os.path.exists(npy_folder):
    os.makedirs(npy_folder)

# 处理所有 .pt 文件
pt_files = glob.glob(os.path.join(pt_folder, '*.pt'))

for pt_file in pt_files:
    pt_filename = os.path.basename(pt_file)
    # 去掉 '.pdb' 和链信息，只保留基础文件名
    base_name = pt_filename.split('_')[0]
    base_name=base_name.split('.')[0]
    npy_file = os.path.join(npy_folder, f"{base_name}.npy")
    convert_pt_to_npy(pt_file, npy_file)

protein_process.py successfully runs without error. However, the train and test of the generated 50% cross check are very large, taking up 150G of disk space.

Hello author, It is impossible to create the protein embedding files because of an error while saving the file. It seems the embeddings have different shapes and can not be saved together in a single file. Please, can you verify and indicate a solution? File "/home/angeloduarte/AttentionMGT-DTA/protein_process.py", line 349, in Protein_embedding_process(dataset=dataset, fold=fold, id_train=protein_id_train, id_test=protein_id_test, dir_output=dir_output) File "/home/angeloduarte/AttentionMGT-DTA/protein_process.py", line 301, in Protein_embedding_process np.save(dir_output + '/train/fold/' + str(fold) + '/protein_embedding.npy', proteins_embedding_train, allow_pickle=True) File "<array_function internals>", line 200, in save File "/home/angeloduarte/.pyenv/versions/mgtdta/lib/python3.9/site-packages/numpy/lib/npyio.py", line 521, in save arr = np.asanyarray(arr) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

我认为是numpy版本的问题，我将numpy版本退化到"pip install numpy==1.23.0 "，成功解决这个问题，我用在alphafold数据库中使用uniportID找的的蛋白质，先将其的氨基酸three to one，变为从pdb变为fasta，再用使用esmfold的"python scripts/extract.py esm2_t33_650M_UR50D kiba_sequences.fasta embeddings_output --repr_layers 33 --include per_tok --truncation_seq_length 4096"，再将其变为npy格式，重命名为ESM_embedding，成功运行protein_process.py并且没有报错，但是生成的五折交叉检验的train和test非常巨大，居然占了150G的磁盘空间。 I think it is the numpy version problem, I degraded the numpy version to "pip install numpy==1.23.0 ", successfully solved this problem, I used the protein found by uniportID in alphafold database, first three to one of its amino acids, To change from pdb to fasta,
import sys
import os
import pickle
from argparse import ArgumentParser
from Bio.PDB import PDBParser
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from tqdm import tqdm
from Bio import SeqIO

three_to_one = {'ALA':    'A',
                'ARG':    'R',
                'ASN':    'N',
                'ASP':    'D',
                'CYS':    'C',
                'GLN':    'Q',
                'GLU':    'E',
                'GLY':    'G',
                'HIS':    'H',
                'ILE':    'I',
                'LEU':    'L',
                'LYS':    'K',
                'MET':    'M',
                'MSE':  'M', # MSE this is almost the same AA as MET. The sulfur is just replaced by Selen
                'PHE':    'F',
                'PRO':    'P',
                'PYL':    'O',
                'SER':    'S',
                'SEC':    'U',
                'THR':    'T',
                'TRP':    'W',
                'TYR':    'Y',
                'VAL':    'V',
                'ASX':    'B',
                'GLX':    'Z',
                'XAA':    'X',
                'XLE':    'J'}

parser = ArgumentParser()
parser.add_argument('--out_file', type=str, default="./KIBA_sequences.fasta")

parser.add_argument('--dataset', type=str, default="KIBA")
parser.add_argument('--data_dir', type=str, default='/root/sjb/chem/esm-main/pdb/KIBA/PDB_AF2', help='')
args = parser.parse_args()

biopython_parser = PDBParser()

def get_structure_from_file(file_path):
    structure = biopython_parser.get_structure('random_id', file_path)
    structure = structure[0]
    l = []
    for i, chain in enumerate(structure):
        seq = ''
        for res_idx, residue in enumerate(chain):
            if residue.get_resname() == 'HOH':
                continue
            residue_coords = []
            c_alpha, n, c = None, None, None
            for atom in residue:
                if atom.name == 'CA':
                    c_alpha = list(atom.get_vector())
                if atom.name == 'N':
                    n = list(atom.get_vector())
                if atom.name == 'C':
                    c = list(atom.get_vector())
            if c_alpha != None and n != None and c != None:  # only append residue if it is an amino acid
                try:
                    seq += three_to_one[residue.get_resname()]
                except Exception as e:
                    seq += '-'
                    print("encountered unknown AA: ", residue.get_resname(), ' in the complex ', file_path, '. Replacing it with a dash - .')
        l.append(seq)
    return l

data_dir = args.data_dir
names = os.listdir(data_dir)

if args.dataset == 'KIBA':
    sequences = []
    ids = []

    for name in tqdm(names):
        if name == '.DS_Store': continue
        rec_path = os.path.join(data_dir, name)
        l = get_structure_from_file(rec_path)
        for i, seq in enumerate(l):
            sequences.append(seq)
            ids.append(f'{name}_chain_{i}')
    records = []
    for (index, seq) in zip(ids, sequences):
        record = SeqRecord(Seq(seq), str(index))
        record.description = ''
        records.append(record)
    SeqIO.write(records, args.out_file, "fasta")
Then use esmfold's "python scripts/extract.py esm2_t33_650M_UR50D kiba_sequences.fasta embeddings_output --repr_layers 33 --include per_tok --truncation_seq_length 4096" and change it to npy format and rename it to ESM_embedding.
import torch
import numpy as np
import os
import glob

def convert_pt_to_npy(pt_file, npy_file):
    # 加载 .pt 文件
    data = torch.load(pt_file)

    # 检查并提取 'representations' 键下的数据
    if isinstance(data, dict) and 'representations' in data:
        representations = data['representations']

        # 确保 'representations' 是一个字典并包含所需层的张量
        if isinstance(representations, dict):
            for key, value in representations.items():
                if isinstance(value, torch.Tensor):
                    # 保存为 .npy 文件
                    np.save(npy_file, value.numpy())
                    print(f"Saved data from {pt_file} (key: {key}) to {npy_file}.")
                    return
        else:
            print(f"'representations' key does not contain a dictionary in {pt_file}.")
    else:
        print(f"'representations' key not found in {pt_file}.")

# 示例使用
pt_folder = '/root/sjb/chem/esm-main/pdb/KIBA/embeddings_output'
npy_folder = '/root/sjb/chem/esm-main/pdb/KIBA/ESM_embedding'

# 确保 npy 文件夹存在
if not os.path.exists(npy_folder):
    os.makedirs(npy_folder)

# 处理所有 .pt 文件
pt_files = glob.glob(os.path.join(pt_folder, '*.pt'))

for pt_file in pt_files:
    pt_filename = os.path.basename(pt_file)
    # 去掉 '.pdb' 和链信息，只保留基础文件名
    base_name = pt_filename.split('_')[0]
    base_name=base_name.split('.')[0]
    npy_file = os.path.join(npy_folder, f"{base_name}.npy")
    convert_pt_to_npy(pt_file, npy_file)
protein_process.py successfully runs without error. However, the train and test of the generated 50% cross check are very large, taking up 150G of disk space.

Thanks for the debugging tips! I'll implement them and share the outcome here.

Hello author, It is impossible to create the protein embedding files because of an error while saving the file. It seems the embeddings have different shapes and can not be saved together in a single file. Please, can you verify and indicate a solution? File "/home/angeloduarte/AttentionMGT-DTA/protein_process.py", line 349, in Protein_embedding_process(dataset=dataset, fold=fold, id_train=protein_id_train, id_test=protein_id_test, dir_output=dir_output) File "/home/angeloduarte/AttentionMGT-DTA/protein_process.py", line 301, in Protein_embedding_process np.save(dir_output + '/train/fold/' + str(fold) + '/protein_embedding.npy', proteins_embedding_train, allow_pickle=True) File "<array_function internals>", line 200, in save File "/home/angeloduarte/.pyenv/versions/mgtdta/lib/python3.9/site-packages/numpy/lib/npyio.py", line 521, in save arr = np.asanyarray(arr) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

我认为是numpy版本的问题，我将numpy版本退化到"pip install numpy==1.23.0 "，成功解决这个问题，我用在alphafold数据库中使用uniportID找的的蛋白质，先将其的氨基酸three to one，变为从pdb变为fasta，再用使用esmfold的"python scripts/extract.py esm2_t33_650M_UR50D kiba_sequences.fasta embeddings_output --repr_layers 33 --include per_tok --truncation_seq_length 4096"，再将其变为npy格式，重命名为ESM_embedding，成功运行protein_process.py并且没有报错，但是生成的五折交叉检验的train和test非常巨大，居然占了150G的磁盘空间。 I think it is the numpy version problem, I degraded the numpy version to "pip install numpy==1.23.0 ", successfully solved this problem, I used the protein found by uniportID in alphafold database, first three to one of its amino acids, To change from pdb to fasta,
import sys
import os
import pickle
from argparse import ArgumentParser
from Bio.PDB import PDBParser
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from tqdm import tqdm
from Bio import SeqIO

three_to_one = {'ALA':  'A',
                'ARG':  'R',
                'ASN':  'N',
                'ASP':  'D',
                'CYS':  'C',
                'GLN':  'Q',
                'GLU':  'E',
                'GLY':  'G',
                'HIS':  'H',
                'ILE':  'I',
                'LEU':  'L',
                'LYS':  'K',
                'MET':  'M',
                'MSE':  'M', # MSE this is almost the same AA as MET. The sulfur is just replaced by Selen
                'PHE':  'F',
                'PRO':  'P',
                'PYL':  'O',
                'SER':  'S',
                'SEC':  'U',
                'THR':  'T',
                'TRP':  'W',
                'TYR':  'Y',
                'VAL':  'V',
                'ASX':  'B',
                'GLX':  'Z',
                'XAA':  'X',
                'XLE':  'J'}

parser = ArgumentParser()
parser.add_argument('--out_file', type=str, default="./KIBA_sequences.fasta")

parser.add_argument('--dataset', type=str, default="KIBA")
parser.add_argument('--data_dir', type=str, default='/root/sjb/chem/esm-main/pdb/KIBA/PDB_AF2', help='')
args = parser.parse_args()

biopython_parser = PDBParser()

def get_structure_from_file(file_path):
    structure = biopython_parser.get_structure('random_id', file_path)
    structure = structure[0]
    l = []
    for i, chain in enumerate(structure):
        seq = ''
        for res_idx, residue in enumerate(chain):
            if residue.get_resname() == 'HOH':
                continue
            residue_coords = []
            c_alpha, n, c = None, None, None
            for atom in residue:
                if atom.name == 'CA':
                    c_alpha = list(atom.get_vector())
                if atom.name == 'N':
                    n = list(atom.get_vector())
                if atom.name == 'C':
                    c = list(atom.get_vector())
            if c_alpha != None and n != None and c != None:  # only append residue if it is an amino acid
                try:
                    seq += three_to_one[residue.get_resname()]
                except Exception as e:
                    seq += '-'
                    print("encountered unknown AA: ", residue.get_resname(), ' in the complex ', file_path, '. Replacing it with a dash - .')
        l.append(seq)
    return l

data_dir = args.data_dir
names = os.listdir(data_dir)

if args.dataset == 'KIBA':
    sequences = []
    ids = []

    for name in tqdm(names):
        if name == '.DS_Store': continue
        rec_path = os.path.join(data_dir, name)
        l = get_structure_from_file(rec_path)
        for i, seq in enumerate(l):
            sequences.append(seq)
            ids.append(f'{name}_chain_{i}')
    records = []
    for (index, seq) in zip(ids, sequences):
        record = SeqRecord(Seq(seq), str(index))
        record.description = ''
        records.append(record)
    SeqIO.write(records, args.out_file, "fasta")
Then use esmfold's "python scripts/extract.py esm2_t33_650M_UR50D kiba_sequences.fasta embeddings_output --repr_layers 33 --include per_tok --truncation_seq_length 4096" and change it to npy format and rename it to ESM_embedding.
import torch
import numpy as np
import os
import glob

def convert_pt_to_npy(pt_file, npy_file):
    # 加载 .pt 文件
    data = torch.load(pt_file)

    # 检查并提取 'representations' 键下的数据
    if isinstance(data, dict) and 'representations' in data:
        representations = data['representations']

        # 确保 'representations' 是一个字典并包含所需层的张量
        if isinstance(representations, dict):
            for key, value in representations.items():
                if isinstance(value, torch.Tensor):
                    # 保存为 .npy 文件
                    np.save(npy_file, value.numpy())
                    print(f"Saved data from {pt_file} (key: {key}) to {npy_file}.")
                    return
        else:
            print(f"'representations' key does not contain a dictionary in {pt_file}.")
    else:
        print(f"'representations' key not found in {pt_file}.")

# 示例使用
pt_folder = '/root/sjb/chem/esm-main/pdb/KIBA/embeddings_output'
npy_folder = '/root/sjb/chem/esm-main/pdb/KIBA/ESM_embedding'

# 确保 npy 文件夹存在
if not os.path.exists(npy_folder):
    os.makedirs(npy_folder)

# 处理所有 .pt 文件
pt_files = glob.glob(os.path.join(pt_folder, '*.pt'))

for pt_file in pt_files:
    pt_filename = os.path.basename(pt_file)
    # 去掉 '.pdb' 和链信息，只保留基础文件名
    base_name = pt_filename.split('_')[0]
    base_name=base_name.split('.')[0]
    npy_file = os.path.join(npy_folder, f"{base_name}.npy")
    convert_pt_to_npy(pt_file, npy_file)
protein_process.py successfully runs without error. However, the train and test of the generated 50% cross check are very large, taking up 150G of disk space.
Thanks for the debugging tips! I'll implement them and share the outcome here.

I ran "train_DTA.py" on NVIDIA A10, I adjusted the batchsize to 32, which occupied 16G of video memory, and an epoch was about 300s, which took quite a long time.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10          On   | 00000000:A1:00.0 Off |                    0 |
|  0%   68C    P0    97W / 150W |  16008MiB / 23028MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2395526      C   python                          16006MiB |

run.log

Training on Davis, fold:1
Epoch       Time        MSE     RMSE        CI      r2
1       338.2       0.71851     0.84765     0.52184     0.00186
MSE improved at epoch  1 ;  best_mse: 0.71850604
2       672.69      0.6213      0.78823     0.62406     0.10358
MSE improved at epoch  2 ;  best_mse: 0.6213007
3       1009.48     0.63453     0.79657     0.69185     0.19261
4       1343.18     0.78804     0.88772     0.70158     0.18175
5       1676.81     0.5552      0.74512     0.71946     0.21468
model has been saved
MSE improved at epoch  5 ;  best_mse: 0.5551988
6       2010.52     0.54099     0.73552     0.72459     0.24102
model has been saved
MSE improved at epoch  6 ;  best_mse: 0.54099154
7       2344.79     0.56821     0.7538      0.73395     0.26291
8       2679.8      0.51267     0.71601     0.74235     0.26899
model has been saved
MSE improved at epoch  8 ;  best_mse: 0.51267076
9       3014.65     0.53715     0.73291     0.74662     0.28444
10      3350.2      0.49913     0.70649     0.74512     0.27123
model has been saved
MSE improved at epoch  10 ; best_mse: 0.49912947
11      3684.97     0.59459     0.7711      0.75886     0.29591
12      4020.58     0.46626     0.68283     0.77131     0.32595
model has been saved
MSE improved at epoch  12 ; best_mse: 0.46626085
13      4356.86     0.55538     0.74524     0.77141     0.32872
14      4689.92     0.50944     0.71375     0.77065     0.32065
15      5023.55     0.46043     0.67855     0.77954     0.33824
model has been saved
MSE improved at epoch  15 ; best_mse: 0.46043083
16      5357.15     0.4547      0.67432     0.77686     0.32661
model has been saved
MSE improved at epoch  16 ; best_mse: 0.4547045
17      5690.87     0.44925     0.67026     0.77922     0.33406
model has been saved
MSE improved at epoch  17 ; best_mse: 0.4492497
18      6023.85     0.58687     0.76608     0.77993     0.3159
19      6357.44     0.44637     0.66811     0.78223     0.33424
model has been saved
MSE improved at epoch  19 ; best_mse: 0.4463707
20      6691.65     0.45362     0.67352     0.7877      0.35712
21      7025.6      0.44647     0.66818     0.78955     0.32525
22      7361.13     0.43501     0.65956     0.79135     0.36941
model has been saved
MSE improved at epoch  22 ; best_mse: 0.43501356
23      7698.18     0.43182     0.65713     0.7896      0.34477
model has been saved
MSE improved at epoch  23 ; best_mse: 0.4318245
24      8036.83     0.46952     0.68522     0.79155     0.35701
25      8373.4      0.42616     0.65281     0.79412     0.36326
model has been saved
MSE improved at epoch  25 ; best_mse: 0.4261638
26      8709.37     0.40033     0.63272     0.79835     0.39689
model has been saved
MSE improved at epoch  26 ; best_mse: 0.40032905
27      9046.42     0.44054     0.66373     0.7991      0.34304
28      9382.96     0.41973     0.64787     0.79203     0.36607
29      9719.2      0.47403     0.6885      0.79983     0.37105
30      10056.18        0.42036     0.64835     0.80163     0.39142
31      10390.61        0.46906     0.68488     0.80842     0.36977
32      10724.19        0.41877     0.64712     0.80195     0.39623
33      11058.63        0.3937      0.62746     0.80544     0.41278

I have been running this program for almost two days and have only reached less than 300 epochs

JK-Liu7 / AttentionMGT-DTA

Problem in protein_process.py #13