datamol-io / datamol

Molecular Processing Made Easy.
https://docs.datamol.io
Apache License 2.0
462 stars 48 forks source link

Refactor `dm.scaffold.fuzzy_scaffolding` #114

Open hadim opened 2 years ago

hadim commented 2 years ago

dm.scaffold.fuzzy_scaffolding is quite a powerful function but its output is often hard to understand and also process for downstream task.

We could keep backward compat by keeping dm.scaffold.fuzzy_scaffolding and propose an alternative function that will do the same kind of processing under the hood but return a data structure that is more intuitive and easier to use (a dataframe or a list of dataframe?).

Pakman450 commented 1 year ago

@hadim Ill try to take a jab at this. Let you know what happens. Question: Does the output results have multi valued attributes? I don't know much about the output as of yet, but this dataframe route could make it difficult to implement if there are multiple and/or range of data points for a single resulting input. Thanks.

maclandrol commented 1 year ago

Thanks @Pakman450

Question: Does the output results have multi valued attributes?

Currently the output is a tuple of list and dicts. I would say for the refactoring, it's perfectly ok to completely rethink the output. Whether a dataframe would be a good fit would need to be assessed, but it's probably fine to have some new structure (a dataclass or something similar) to hold the results. It's likely that the code and output can be simplified too.

Happy to help around that if you are taking a jab at this.