when the getitem function is called, get_one_hot(smiles, 350) is called. In get_one_hot(smiles, 350) function, the array_length is limited within 350, but the index of numeric can exceed 350, causing the IndexError for one_hot in axis 0.
def get_one_hot(smiles: str, pad_len: int = -1) -> np.ndarray:
"""Generate one-hot representation of a Smiles string.
Args:
smiles (str): Input molecule as Smiles
pad_len (int, optional): Whether or not to pad to a given size. Defaults to -1.
Returns:
np.ndarray: Array containing the one-hot encoded Smiles
"""
smiles = smiles + "."
# initialize array
array_length = len(smiles) if pad_len < 0 else pad_len
vocab_size = len(__vocab)
one_hot = np.zeros((array_length, vocab_size))
tokens = tokenize(smiles)
numeric = [__vocab_c2i.get(token, __unk) for token in tokens]
for pos, num in enumerate(numeric): //pos can exceed 350
one_hot[pos, num] = 1 //IndexError
return one_hot
when the getitem function is called, get_one_hot(smiles, 350) is called. In get_one_hot(smiles, 350) function, the array_length is limited within 350, but the index of numeric can exceed 350, causing the IndexError for one_hot in axis 0.