basf / MolPipeline

A Python package for processing molecules with RDKit in scikit-learn
MIT License
133 stars 7 forks source link

More input validation for Pipeline.fit_transform etc. #22

Open JochenSiegWork opened 4 months ago

JochenSiegWork commented 4 months ago

The fit, fit_transform, transform, etc. functions in the Pipeline and PipelineElement classes expect an Iterable as input. Some Iterables lead to errors or even wrong results, so we should handle those during the input validation step.

For example, pipeline.fit(["CCC"]) works correctly, but when you forget the [...] list, e.g. pipeline.fit("CCC"), you still give an Iterable, but it's just the SMILES string. Depending on the Pipeline elements, this gives an informative error, or even wrong results are reported without raising any errors.