Open JHlozek opened 3 weeks ago
Hi @JHlozek
Thanks for this summary. I think Option 2 makes sense for now, and once we improve the modularity of ZairaChem we can improve on point 3 and generate reports etc. I will assign this issue to me
Hi @GemmaTuron
I've pushed the recent commit to ZairaChem that implements this feature. The ZairaChem predict command first tries to load the model path with onnx and runs the onnx model if appropriate else it runs the usual ZairaChem pipeline.
I've started working on getting ZairaChem to run ONNX inference if 'zairachem predict -m' is called with an onnx model. However, seeing as this does not require pointing to a standard ZairaChem model, the pipeline doesn't have access to the original parameters.json file and needs some refactoring to cater for this.
My question is how much effort/changes do we want here? I can think of three broad options for ONNX predictions: 1) 'zairachem predict' as just a basic wrapper that takes a csv, expects a 'SMILES' column, and returns a csv output. 2) 'zairachem predict' still runs the input standardization step (just a few minimal changes needed here) and produces a reduced output folder with the onnx predictions. 3) We aim to run as much of ZairaChem as possible, just excluding the estimator/pooling/etc and still producing reports. This could be done in a few ways (e.g. optionally pointing to the original zairachem model to get the parameters).
I've pretty much got option (2) working, but I wanted to check what you envisioned from a design perspective? Keep it simple or invest for more functionality?