datarevenue-berlin / OpenMLOps

MIT License
703 stars 101 forks source link

How to move the development process to local server? #87

Open hieuphung97-pixta opened 3 years ago

hieuphung97-pixta commented 3 years ago

Hi, OpenMLOps project is awesome and our research team wants to use it to adopt MLOps practices. We have successfully set up OpenMLOps on AWS. From my understanding, the development (data preprocessing, training models, etc.) and deployment processes will both run on AWS, this allows us to form the big loop (retrieved raw data -> processed data -> trained models -> served models -> monitoring -> problems -> triggers -> retrieving new data -> processed data -> new trained model -> ... ). However, during the development process, we use our local servers for EDA, preprocessing data, coding, and conducting experiments. I have two questions

Thanks!

bernardolk commented 2 years ago

Hi @hieuphung97-pixta , thanks for your interest! If I understand your question, you want to deploy OpenMLOps locally, without AWS, which means in the end that you want to deploy a Kubernetes cluster locally. We are working on our 0.3.0 release which will enable you to use Minikube for that effect. Later on, you can always deploy what has been developed to AWS by only changing the deployment from Minikube to AWS in Terraform, making it quite a seamless transition from local to cloud. For interaction with custom services, you can either expose them to OpenMLOps somehow or create docker images and deploy these custom services as Kubernetes deployments which can then interact with OpenMLOps intra-cluster, either in the cloud or locally. I hope I have answered you :)

lschneidpro commented 2 years ago

Hi @bernardolk,

First of all, I also implemented the OpenMLOps in my current workplace and I think it is a great initiative. I think @hieuphung97-pixta is trying to deal with the same kind of issue I'm facing. What would be the use-case of OpenMLOps in a traditional software delivery process with DEV->UAT->PROD? Once the MLOps is set up, it sounds like everything should be done in PROD as it sounds redundant to perform the same model training across all stages. As mlflow acts as a model registry, it makes sense to me to consider one mlflow server for DEV, UAT, and PROD while different prefect servers for each environment. What is your opinion/experience on this? with CI/CD if possible?

If I understand @hieuphung97-pixta correctly, he wants to be able to use a first OpenMLOPs instance locally to develop his models and workflow, and later deploy everything on AWS including the metadata of MLflow and Prefect

bernardolk commented 2 years ago

Thanks for the compliment @lschneidpro! I hope you are having a nice experience so far. I think that's a different question, but nevertheless an interesting one. That would depend on what you feel is most comfortable for your use-case, I would say. In our OpenMLOps use-case, we have a dev cluster and a prod cluster deployed, in which we can test both infrastructure changes (OpenMLOps code itself) and business/services changes. It's open for debate whether that's the best approach, since, as you said, there is redundancy at some points. I think the approach you described sounds fine, but it all comes down to what your development cycle will look like, if you feel you need a different Prefect server running for dev, that's fine, but maybe you can test dev-flows in the same server as prod, as long as you have enough resources in your cluster to make sure they won't compete it probably would be fine too.

lschneidpro commented 2 years ago

@bernardolk Thanks for your answer Let me re-phrase this to be sure I understand this correctly. You maintain the OpenMLOps base code through DEV and PROD. Your DS and MLE (those who are building ML-apps) are going through DEV and PROD, but both environments leveraging the OpenMLOps tools deployed in PROD?

Thanks again for your answer.

bernardolk commented 2 years ago

We have OpenMLOps deployed both in DEV and PROD for our machine learning engineers, completely isolated environments. But I suggested that you could have a hybrid approach.