bioinf-jku / FCD

Fréchet ChemNet Distance: A quality measure for generative models for molecules
GNU Lesser General Public License v3.0
68 stars 26 forks source link

<Bug> Issue when using the function get_fcd #11

Closed Alvaro-Ciudad closed 10 months ago

Alvaro-Ciudad commented 2 years ago

Hi, I am getting the following error when calling this function with two 10.000 molecules lists.

UnknownError: Graph execution error:

2 root error(s) found.
  (0) UNKNOWN:  IndexError: string index out of range
Traceback (most recent call last):

  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/script_ops.py", line 271, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py", line 642, in wrapper
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 1004, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "/usr/local/lib/python3.7/dist-packages/keras/engine/data_adapter.py", line 830, in wrapped_generator
    for data in generator_fn():

  File "/usr/local/lib/python3.7/dist-packages/fcd/FCD.py", line 156, in myGenerator_predict
    smiEnc = get_one_hot(currentSmiles, pad_len=nn)

  File "/usr/local/lib/python3.7/dist-packages/fcd/FCD.py", line 127, in get_one_hot
    if smiles[i + 1] in ['r', 'i', 'l']:

IndexError: string index out of range

     [[{{node PyFunc}}]]
     [[IteratorGetNext]]
     [[IteratorGetNext/_2]]
  (1) UNKNOWN:  IndexError: string index out of range
Traceback (most recent call last):

  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/script_ops.py", line 271, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py", line 642, in wrapper
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 1004, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "/usr/local/lib/python3.7/dist-packages/keras/engine/data_adapter.py", line 830, in wrapped_generator
    for data in generator_fn():

  File "/usr/local/lib/python3.7/dist-packages/fcd/FCD.py", line 156, in myGenerator_predict
    smiEnc = get_one_hot(currentSmiles, pad_len=nn)

  File "/usr/local/lib/python3.7/dist-packages/fcd/FCD.py", line 127, in get_one_hot
    if smiles[i + 1] in ['r', 'i', 'l']:

IndexError: string index out of range

     [[{{node PyFunc}}]]
     [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_predict_function_2176]

I have an feeling that it might be related to the smiles I generated but I am not a 100% sure. Could you confirm this is the case? Thanks

Alvaro-Ciudad commented 1 year ago

This error is caused by empty smiles being imputed into the function. For some reason rdkit considers them a valid input. Please, add an assertion/some way to catch it, just in case someone else has the same problem as I did

renzph commented 1 year ago

Thanks for your report. I'll try to find time to fix the issue.

renzph commented 10 months ago

Indeed the function fails because of a lookahead for two letter Smiles tokens. I just put in a fix. Sorry for taking so long.