AI-multimodal / aimm-post-processing

This repository is depricated. We will be doing all postprocessing in `aimmdb.postprocessing`.
Other
0 stars 3 forks source link

Add Identity operator #8

Closed matthewcarbone closed 2 years ago

matthewcarbone commented 2 years ago

I've added the scaffold for post-processing operators. Does everyone like the style I've implemented here? Note that the __call__ method will not actually work, since x.metadata is immutable in the current way that the nodes work.

Thoughts? Please don't merge until we've discussed a bit!

Note while this might seem trivial these types of design decisions are super important early on and will influence everything downstream. So even though the Identity operator is hilariously simple it's really important we get the metadata tracking correct now. Also I know that Joe will likely be making some more API changes as per our discussion today, but I don't think it will impact the high-level discussion.

danielballan commented 2 years ago

I'm a little skeptical of frameworks or workflow managers that attempt to propagate metadata through operations. I think working with base types (numpy, dataframe, dict) and handling them explicitly, while less slick, results in more universally readable and approachable code. It also composes more naturally with existing tools.

When a professional acquaintance of mine was tasked with writing a framework for data analysis in Python for an astrophysics institution, he countered: "The framework is Python". I keep that in mind a lot.

A similar (not identical) issue is addressed here: https://nsls-ii.github.io/scientific-python-cookiecutter/guiding-design-principles.html#stop-writing-classes

But, I haven't spent much time considering the specifics of this application, and if you decide to go ahead and explore this direction I have no concerns. :-)

danielballan commented 2 years ago

I'm a little skeptical of frameworks or workflow managers that attempt to propagate metadata through operations. I think working with base types (numpy, dataframe, dict) and handling them explicitly, while less slick, results in more universally readable and approachable code. It also composes more naturally with existing tools.

When a professional acquaintance of mine was tasked with writing a framework for data analysis in Python for an astrophysics institution, he countered: "The framework is Python". I keep that in mind a lot.

A similar (not identical) issue is addressed here: https://nsls-ii.github.io/scientific-python-cookiecutter/guiding-design-principles.html#stop-writing-classes

But, I haven't spent much time considering the specifics of this application, and if you decide to go ahead and explore this direction I have no concerns. :-)

matthewcarbone commented 2 years ago

@danielballan Thanks for the feedback.

I hear you about keeping things as "class-free" as possible. My argument for doing this is that it provides a natural way of keeping track of the version, operator type, etc. through the MSONable class, which I've grown to love. Users can then create new operations by simply inheriting Operator and somewhere else we will simply log the operation through operator.as_dict(), and append that directly to the metadata.

I'll also add that the classes themselves (the object) will never be directly saved except through the use of MSONable, thus keeping things as pythonic as possible.

I don't personally plan on using classes much more than this though 😄

What do you think?

matthewcarbone commented 2 years ago

In the interest of getting things done, I'm going to roll with this way of doing it for now until when/if we come up with a better way.