ersilia-os / zaira-chem

Automated QSAR based on multiple small molecule descriptors
GNU General Public License v3.0
29 stars 11 forks source link

ZairaChem ONNX predictor #48

Open JHlozek opened 3 weeks ago

JHlozek commented 3 weeks ago

I've started working on getting ZairaChem to run ONNX inference if 'zairachem predict -m' is called with an onnx model. However, seeing as this does not require pointing to a standard ZairaChem model, the pipeline doesn't have access to the original parameters.json file and needs some refactoring to cater for this.

My question is how much effort/changes do we want here? I can think of three broad options for ONNX predictions: 1) 'zairachem predict' as just a basic wrapper that takes a csv, expects a 'SMILES' column, and returns a csv output. 2) 'zairachem predict' still runs the input standardization step (just a few minimal changes needed here) and produces a reduced output folder with the onnx predictions. 3) We aim to run as much of ZairaChem as possible, just excluding the estimator/pooling/etc and still producing reports. This could be done in a few ways (e.g. optionally pointing to the original zairachem model to get the parameters).

I've pretty much got option (2) working, but I wanted to check what you envisioned from a design perspective? Keep it simple or invest for more functionality?

GemmaTuron commented 1 week ago

Hi @JHlozek

Thanks for this summary. I think Option 2 makes sense for now, and once we improve the modularity of ZairaChem we can improve on point 3 and generate reports etc. I will assign this issue to me

JHlozek commented 5 days ago

Hi @GemmaTuron

I've pushed the recent commit to ZairaChem that implements this feature. The ZairaChem predict command first tries to load the model path with onnx and runs the onnx model if appropriate else it runs the usual ZairaChem pipeline.