`train_expansion_keras_model` yields error in dimensions of logits and labels

hubbs5 commented 1 year ago

Hello,

I have a large database of over 92k reaction templates and I'm trying to retrain the expansion model. I ran the pre-process expansion step and moved to train, but am encountering the following error:

InvalidArgumentError                      Traceback (most recent call last)
/tmp/ipykernel_5701/843677716.py in <module>
      9 )
     10 
---> 11 train_expansion_keras_model(config)
     12 
     13 # def main(optional_args: Optional[Sequence[str]] = None) -> None:

~/anaconda3/envs/retro/lib/python3.7/site-packages/aizynthfinder/training/keras_models.py in train_expansion_keras_model(config)
    197         "categorical_crossentropy",
    198         ["accuracy", "top_k_categorical_accuracy", top10_acc, top50_acc],
--> 199         config,
    200     )
    201 

~/anaconda3/envs/retro/lib/python3.7/site-packages/aizynthfinder/training/keras_models.py in _train_keras_model(model, train_seq, valid_seq, loss, metrics, config)
    166         use_multiprocessing=False,
    167         shuffle=True,
--> 168         initial_epoch=0,
    169     )
    170 

~/anaconda3/envs/retro/lib/python3.7/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   2221         use_multiprocessing=use_multiprocessing,
...
    File "/home/christian/anaconda3/envs/retro/lib/python3.7/site-packages/keras/backend.py", line 5099, in categorical_crossentropy
      labels=target, logits=output, axis=axis)
Node: 'categorical_crossentropy/softmax_cross_entropy_with_logits'
logits and labels must be broadcastable: logits_size=[0,92366] labels_size=[256,92366]
     [[{{node categorical_crossentropy/softmax_cross_entropy_with_logits}}]] [Op:__inference_train_function_798]

Examining the Keras model, it seems to be constructed correctly. Here's the summary:

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_9 (Dense)             (None, 512)               1049088   

 dropout_5 (Dropout)         (None, 512)               0         

 dense_10 (Dense)            (None, 92366)             47383758  

=================================================================
Total params: 48,432,846
Trainable params: 48,432,846
Non-trainable params: 0
_________________________________________________________________

My only thought is that something didn't quite work as expected in the pre-processing step, which led to an issue with the train_seq or valid_seq generator. I'm unsure how to debug this and would be grateful for any advice!

SGenheden commented 1 year ago

Thanks for your feedback.

This is a hard one to debug. What is the size of your training and validation sets?

I will soon release a new package for training these models that hopefully is more robust than these old script, that are frankly not that great.

hubbs5 commented 1 year ago

I did manage to get this fixed - I don't remember what it was exactly but it was related to the pre-processing script. If I recall correctly, it was dropping molecules that couldn't be converted into fingerprints which threw off the dimensions of the data being passed.

MolecularAI / aizynthfinder

`train_expansion_keras_model` yields error in dimensions of logits and labels #90