Neuraxio / Neuraxle

The world's cleanest AutoML library ✨ - Do hyperparameter tuning with the right pipeline abstractions to write clean deep learning production pipelines. Let your pipeline steps have hyperparameter spaces. Design steps in your pipeline like components. Compatible with Scikit-Learn, TensorFlow, and most other libraries, frameworks and MLOps environments.
https://www.neuraxle.org/
Apache License 2.0
608 stars 62 forks source link

Documentation: Loading pipelines for inference #481

Closed joel-odlund closed 1 year ago

joel-odlund commented 3 years ago

Reading about saving and loading, I find it hard to understand how to save and load a model in order to use it for inference. In particular, it's not clear to me how the setup() phase relates to saving and loading.

This page gives an overview of the lifecycle of a model.
Here it is implied that 'setup()' occurs after load. However, it does not seem like the setup method is being called anywhere, except when fitting a model.

It would be nice with documentation on how the lifecycle works for inference. For example:

vincent-antaki commented 3 years ago

Hello Joel!

Here is some useful information with regards to your question:

side note : I think the flowchart may be getting old a bit.

Specifically with regards to your 3 options, the third one is the intended usage. It is expected that setup is only called through pipeline fit calls. From there, here are a couple of options you have :

Overall, I agree with you that setup is poorly documented and might need to be revisited eventually.

Feel free to ask more questions if you have any, I'll be glad to help you. Cheers!

joel-odlund commented 3 years ago

Thank you. This brings some clarity, i will try out some of these ideas. some questions come to mind, that might be useful if you decide to revisit the documentation.

There are some nice alternatives you give for when and how to use the setup method. I do think however that its imprtant for Neuraxle and its wider adoption that there is one idomatic way of doing it, which is well understood and documented. For example, my current task is to use Neuraxle to implement a general purpose ML environment. Most individual components will not be written by me, but by others on the team that may not have intimate knowledge of Neuraxle, and there will be expectations on clear instructions on how to do things, and why.

I really think Neuraxle is something the community needs, thats why im bringing up these things. Its not to complain :)

vincent-antaki commented 3 years ago

Your input is greatly appreciated. When spending all day using the framework, we can sometime lose sight of how other users would approach the various concepts and abstractions (Well, I don't know about @guillaume-chevalier, but that's more than certainly my case). The framework is in constant evolution and sometime its development is tailored to what specific project we have; this may lead us to have blind spot, or at least a biased priority queue. Comments like yours are essential for us to keep a healthy list of what needs to be done, both in term of documentation and code.

For your first question, it first and foremost a question of proper function encapsulation. setup is called before fit and doesn't achieve the same purpose. Furthermore, I think setup used to be called after load, and that behaviour was changed throughout a project a while ago. I think Guillaume may have more information on that specific design choice.

As for your second question, we don't expect users to serialize heavy stuff. Usually, we recommend that heavy and/or shared stuff be handled through the ExecutionContext's services. This is another part of the code which might not be well explained yet as it is fairly recent. I'll refer you to the code since it's rather straightforward, but once again, feel free to ask question if you have any.

vincent-antaki commented 3 years ago

Hey @joel-odlund!

I brought up some questions to Guillaume about the design choices for the setup function and we've concluded that we'll be revisiting it for the next release (0.5.8). Things will differ quite a bit from what I've told you so far although it will not change anything with regards to the save/load aspect of your initial question.

This will be the expected behaviour for setup after the modifications:

Documentation will be changed accordingly. Please do not hesitate to give me your thoughts about that change if you feel like it.

Cheers!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in the next 180 days. Thank you for your contributions.