PatWalters / TS

Thompson Sampling
MIT License
44 stars 6 forks source link

Some Suggestions / Questions #9

Open HiteSit opened 3 months ago

HiteSit commented 3 months ago

Hi all, first thing first terrific work, lovely.

Now I do have a couple of question/suggestions

Finally there is any possibility to add a concurrent.features / multiprocessing wrapper code to make the application run in parallel? It would be super nice specifically for FREDEvaluator (obviusly).

PatWalters commented 1 month ago

Thanks for the valuable feedback. I apologize for the late response. I'm reluctant to enable the upload of molecules with explicit hydrogens, as this can create other issues. Do you have an example of a reaction that requires explicit hydrogens? Adding a protonation step to the FRED evaluator is a good idea. I'll do that. It would be helpful if you had some code you could upload. I'll add some parallelization in the next couple of weeks.

HiteSit commented 1 week ago

Sorry for the even later response, many MultiComponentReactions in which weird condensation happens are difficoult to write in SMARTS (e.g. GBB, Ugi-Tetrazole), but you are right about the Hydrogens, there is almost always a way to write in a implicit fascion.

Regarding the implementation of the RXN fileformat. For example this reaction:

[#1,#6:1]-[#6:7](-[#1,#6:2])=O.[#1,#6,#8:5]-[#7:3]-[#1,#6,#8:4].[#1,#6:9][N+:8]#[C-:6]>>[#1,#6,#8:4]-[#7:3](-[#1,#6,#8:5])[C:7]([#1,#6:1])([#1,#6:2])[#6:6]-1=[#7]-[#7]=[#7]-[#7:8]-1-[#1,#6:9]

very often tents to bug and generally I find that many MCR does not work super nicely with SMARTS.

I could not see into a commit if you already implemented a protonation based on OpenEye but anyway here is my code

def gen_3dmol(smile: str, protonate: bool, gen3d: bool, enum_isomers: bool) -> List[oechem.OEGraphMol]:
    smile_fixed = cansmi(smile, isomeric=True, kekule=True)
    oemol: oechem.OEMol = from_smiles_to_oemol(smile_fixed)

    # Flipping Options
    flipperOpts = oeomega.OEFlipperOptions()

    # Conf Gen initialization
    omega = oeomega.OEOmega()  # For multi confs
    builder = oeomega.OEConformerBuilder()  # For single conf

    if protonate == True:
        logger.info("Protonating the molecule")
        oequacpac.OEGetReasonableProtomer(oemol)

    if enum_isomers == True and gen3d == True:
        enantiomers_3D: List[oechem.OEGraphMol] = []
        for i, enantiomer in enumerate(oeomega.OEFlipper(oemol, flipperOpts)):
            enantiomer = oechem.OEMol(enantiomer)

            logger.info("Generating 3D coordinates")
            ret_code = omega.Build(enantiomer)

            stereo_desc = get_chirality_and_stereo(oechem.OEGraphMol(enantiomer))

            oechem.OESetSDData(enantiomer, "Chiral_ID", f"Stereo_{i}")
            oechem.OESetSDData(enantiomer, "Chiral_Atoms", stereo_desc)
            enantiomers_3D.append(oechem.OEGraphMol(enantiomer))

        return enantiomers_3D

    if enum_isomers == False and gen3d == True:
        logger.info("Generating 3D coordinates")
        ret_code = builder.Build(oemol)
        return [oechem.OEGraphMol(oemol)]

Also I remember that you publisced on your Linkedin a different version of Thomson Sampling (Enhanced). Beside the slighly scientific difference it would be very nice if somehow you add a piece of documentation in your code to write custom scoring functions.