Minor PR. At inference time, a list of substructures could be provided to be excluded from the stochastic masking (substructures_to_mask: List[str])
OLD: If a substructure could not be found in the seed SMILES (matching occurred on string level), it was ignored for all further steps.
NEW: The substructure is only ignored entirely if additionally it cannot be identified in the seed molecule with an RDKit substructure match. Instead, if this matching test is positive, the substructure will be ignored for the masking & generation part, but it will be used in the post-hoc filtering which essentially ensures that all returned molecules contain the substructure (note that this might slow down the inference time in edge cases)
Minor PR. At inference time, a list of substructures could be provided to be excluded from the stochastic masking (
substructures_to_mask: List[str]
)OLD: If a substructure could not be found in the seed SMILES (matching occurred on string level), it was ignored for all further steps.
NEW: The substructure is only ignored entirely if additionally it cannot be identified in the seed molecule with an RDKit substructure match. Instead, if this matching test is positive, the substructure will be ignored for the masking & generation part, but it will be used in the post-hoc filtering which essentially ensures that all returned molecules contain the substructure (note that this might slow down the inference time in edge cases)