NVIDIA / MegaMolBART

A deep learning model for small molecule drug discovery and cheminformatics based on SMILES
134 stars 24 forks source link

Different Embeddings for different batch sizes #11

Open SIDDU-0406 opened 1 year ago

SIDDU-0406 commented 1 year ago

Description of the bug

In the megamolbart container

In the Inference notebook

When I send a single smile A through the connection.smis_to_embedding I get an embedding ...

and when I send a batch of smiles through the connection.smis_to_embedding including the previous smile A now i get a batch of embeddings...

Now the PROBLEM is embedding of the smile A is different in both the cases and the difference margin is also pretty far.

You can check the following code to see the issue:

from infer import InferenceWrapper

import logging import warnings from sklearn.metrics.pairwise import cosine_similarity warnings.filterwarnings('ignore') warnings.simplefilter('ignore')

connection = InferenceWrapper()

smis = ['c1cc2ccccc2cc1', 'COc1cc2nc(N3CCN(C(=O)c4ccco4)CC3)nc(N)c2cc1OC']

a = connection.smis_to_embedding(smis) a1 = connection.smis_to_embedding([smis[0]])

print(cosine_similarity(a1.cpu() , a[0].reshape(1,512).cpu()))

NVIDIA docker image : nvcr.io/nvidia/clara/megamolbart_v0.2:latest