BioColLab / PFresGO

19 stars 5 forks source link

Using predict.py with default arguments results in ValueError #1

Open rnilsenhub opened 1 year ago

rnilsenhub commented 1 year ago

The XX_PFresGO_best_train_model.h5 /model_weights/model/encoder2/kernel:0 has a shape of (64, 256) . When running predict.py with the default arguments it results in the error: ValueError: Cannot assign to variable encoder2/kernel:0 due to variable shape (256, 128) and value shape (256, 64) are incompatible. Changing the hidden_size to 64 results in a different error, ValueError: Operands could not be broadcast together with shapes (None, 128) (None, 64). Thank you for publishing your research and this code repository.

BioColLab commented 1 year ago

For BP ontology prediction, there is only one hidden layer in pre-trained PFresGO model ("--num_hidden_layers 1"), while for MF and CC ontology prediction, the number of hidden layers in pre-trained PFresGO model is 2 ("--num_hidden_layers 2"). Please carefully check the parameters~

northpoleforce commented 9 months ago
python predict.py --num_hidden_layers 2 --ontology 'mf' --model_name 'MF_PFresGO' --res_embeddings './Datasets/per_residue_embeddings.h5'
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
padding (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
res_embed (InputLayer)          [(None, None, 1024)] 0                                            
__________________________________________________________________________________________________
seq (InputLayer)                [(None, None, 26)]   0                                            
__________________________________________________________________________________________________
tf.math.equal (TFOpLambda)      (None, None)         0           padding[0][0]                    
__________________________________________________________________________________________________
model (Functional)              (None, None, 128)    1344896     res_embed[0][0]                  
__________________________________________________________________________________________________
AA_embedding (Dense)            (None, None, 128)    3328        seq[0][0]                        
__________________________________________________________________________________________________
tf.cast (TFOpLambda)            (None, None)         0           tf.math.equal[0][0]              
__________________________________________________________________________________________________
Add_embedding (Add)             (None, None, 128)    0           model[0][0]                      
                                                                 AA_embedding[0][0]               
__________________________________________________________________________________________________
tf.__operators__.getitem (Slici (None, 1, 1, None)   0           tf.cast[0][0]                    
__________________________________________________________________________________________________
decoder_1 (Decoder)             ((1, 489, 128), {'de 396160      Add_embedding[0][0]              
                                                                 tf.__operators__.getitem[0][0]   
__________________________________________________________________________________________________
group_wise_linear (GroupWiseLin (1, 489)             302691      decoder_1[0][0]                  
==================================================================================================
Total params: 2,047,075
Trainable params: 2,047,075
Non-trainable params: 0
__________________________________________________________________________________________________

ValueError: Layer #2 (named "decoder_1" in the current model) was found to correspond to layer decoder_1 in the save file. However the new layer decoder_1 expects 26 weights, but the saved weights have 52 elements.

Why? @BioColLab

BioColLab commented 9 months ago

Hi, I have updated the pre-trained model. Please go ahead and re-download it, and now it runs smoothly.

northpoleforce commented 9 months ago

Thank you very much,now it runs smoothly. But, I found and fix one problem when I try it.

Problem

TypeError: 'list' object cannot be interpreted as an integer
Traceback (most recent call last):
  File "predict.py", line 60, in <module>
    model_name_prefix=args.model_name, label_embedding=go_emb, hidden_size=args.hidden_size, num_heads=args.num_heads, autoencoder_name=args.autoencoder_name)
  File "/home/liujianan/projects/PFresGO/pfresgo/PFresGO.py", line 21, in __init__
    self.decoder = Decoder(num_hidden_layers, hidden_size, num_heads, dff, output_dim, rate=0.1)
  File "/home/liujianan/projects/PFresGO/pfresgo/PFresGO_decoder.py", line 15, in __init__
    self.dec_layers = [DecoderLayer(d_model, num_heads, dff, rate) for _ in range(num_layers)]
TypeError: 'list' object cannot be interpreted as an integer

Fix: modify pfresgo/PFresGO_decoder.py

class Decoder(tf.keras.layers.Layer):
  def __init__(self, num_layers, d_model, num_heads, dff, target_goemb_size, rate=0.1, **kwargs):
    super(Decoder, self).__init__()

    self.d_model = d_model
    # self.num_layers = num_layers
    self.num_layers = num_layers[0]
    self.num_heads = num_heads
    self.dff = dff
    self.target_goemb_size = target_goemb_size
    self.rate = rate

    # self.dec_layers = [DecoderLayer(d_model, num_heads, dff, rate) for _ in range(num_layers)]
    self.dec_layers = [DecoderLayer(d_model, num_heads, dff, rate) for _ in range(num_layers[0])]
    self.dropout = tf.keras.layers.Dropout(rate)
    super(Decoder, self).__init__(**kwargs)

Best wish.

nbuton commented 7 months ago

I think this comes from previous code, in the parser, if nargs is "+" then it will be a list but here an int is wanted. I think the line https://github.com/BioColLab/PFresGO/blob/43d7abe4752a1cf8afb22cc41b349379e6018284/predict.py#L18C1-L18C42 : parser.add_argument('-hlayer', '--num_hidden_layers', type=int, default=2, nargs='+', help="Number of hidden layers.") should be changed to: parser.add_argument('-hlayer', '--num_hidden_layers', type=int, default=2, nargs='?', help="Number of hidden layers.")