GCP for serving and training

sayakpaul commented 1 year ago

@deep-diver, @hunkim

Given our experience with using GCP for various end-to-end ML workflows, I think we could cover the following things:

Local deployments (with FastAPI since it's Hugging Face PT)
Optimization with ONNX (can also include Optimum)
Deployment of a FastAPI app with Docker and K8s (with commentaries on various components of K8s) (K8s will be managed with GKE)
An overview of Vertex AI
Deploying standalone models (such as ONNX-optimized models) to Vertex AI
Load-testing of different endoints and comparing performance metrics

I would mainly try to discuss the conceptual and architectural components of the above since I think that is where the students will find the most value. They will have a better understanding of the approaches which they could further implement them. Of course, we will supplement our lectures with code so that they always have references.

WDYT?

deep-diver commented 1 year ago

Plus, we are also open to other topics related to Machine Learning Pipeline and MLOps system to demonstrate various MLOps scenarios with some Google technologies such as TFX(e2e ML pipeline framework), Vertex AI Pipeline, etc.

Also, besides model deployment to Vertex AI as mentioned by @sayakpaul, we could talk about TensorFlow Serving in general too. In this case, the topic would be ML model deployment with TensorFlow Serving to local and GKE environments.

sayakpaul commented 1 year ago

Plus, we are also open to other topics related to Machine Learning Pipeline and MLOps system to demonstrate various MLOps scenarios with some Google technologies such as TFX(e2e ML pipeline framework), Vertex AI Pipeline, etc.

Yes, a great idea for sure. We have a couple of end to end workflows already implemented. We can definitely consider them for lecture -- discuss motivation, design strategies, key components, etc.

I believe it's important to help the students develop a mindset for dealing with these different scenarios rather than going through codebases from the get go.

hunkim commented 1 year ago

I agree with upto

Local deployments (with FastAPI since it's Hugging Face PT)
Optimization with ONNX (can also include Optimum)
Deployment of a FastAPI app with Docker and K8s (with commentaries on various components of K8s) (K8s will be managed with GKE)

However, it should not be too much depends on Google or AWS tech. If we can use more general tech, it would be better.

If we have substantial benefits from doing so, it's also OK.

sayakpaul commented 1 year ago

We can briefly cover the rest of the points as the other service providers (Azure, AWS, etc.) also offer something similar.

The point is that if there's a better offering, it should be explored since it lets a practitioner focus better. With a managed solution like Vertex AI, authentication, autoscaling, traffic splitting, etc. become easier and more seamless. Hence, the idea.

I kept it to show this point of using targeted managed solutions. It can be realized with other service providers too, but @deep-diver and I are most comfortable with GCP.

hunkim commented 1 year ago

I think one or two lectures with GCP is OK. However, do you think we can get some GCP credit for students?

Also how can VertexAI be used to train sentence transformer in our demo? I will train the sentence transformers using STS.

sayakpaul commented 1 year ago

Also how can VertexAI be used to train sentence transformer in our demo? I will train the sentence transformers using STS.

Please refer to: https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-train-and-tune-pytorch-models-vertex-ai

I think one or two lectures with GCP is OK. However, do you think we can get some GCP credit for students?

Happy to condense it within one. I think you'd need to reach out to your local Google Developer Group community manager for this. As far as I remember GCP has a student tier already which you might wanna check out.

hunkim commented 1 year ago

Please refer to: https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-train-and-tune-pytorch-models-vertex-ai

This is good. This + serving would be a good one lectures (2 hours).

hunkim commented 1 year ago

@deep-diver I also love to have serving comparisons including all possible serving means.

So should we do two lectures (weeks)?

GCP training and serving
All possible serving comparison + evaluation?

sayakpaul commented 1 year ago

possible serving comparison + evaluation

Elaborate.

hunkim commented 1 year ago

@sayakpaul @deep-diver Specifically, Nov 4 and 18. Do you think it's possible?

sayakpaul commented 1 year ago

Fine by me.

deep-diver commented 1 year ago

I am good with the schedule too :)

@hunkim could you explain more about what "all possible serving comparison" means? like all possible frameworks? or all possible configurations that can be made to the TensorFlow Serving?

hunkim commented 1 year ago

@deep-diver fastapi, sagemaker, GCP serving (vertex), KServ, Airflow, etc. If there is more, I think we can add them. The more, the better.

sayakpaul commented 1 year ago

Showing everything is not a very good idea IMO. Presenting a workflow and realizing that with a particular set of services is more feasible and approachable. Besides, our (@deep-diver and mine) expertise is in GCP and its related components. So, we will need to discuss this further.

deep-diver commented 1 year ago

@hunkim Since this is a Univ. lecture, I think it is not ideal to discuss the implementation details and how to use a specific tools/frameworks. Rather, it would be better to discuss considerations and challenges, and then show how one can use Google tech to handle them as an example.

Here is an example of serving part:

The concept of ML model deployment
How is it different from the traditional web deployment
Considerations of ML model deployment (caching, versioning, multi-model serving(+A/B testing), GPU utilization, optimization on different CPU architectures, choosing the right number of threads for message queuing and inter/intra operators, batch inferencing, etc.)
Standalone deployment vs. Scalable deployment within Kubernetes
How TensorFlow Serving fits into those considerations
When to use FastAPI instead of dedicated ML deployment frameworks?
What kind of services Google Cloud offers to handle those considerations without a headache
After the deployment, what is the next step?
- re-training and re-deployment
- monitoring the model decay
- so, we need to construct ML pipeline
- One possible solution to realize ML pipeline is to use Google's open source project TFX
- Showing how TFX works and how TFX and Vertex AI Pipeline plays together nicely
- Showing multiple MLOps scenarios (dual deployments to Cloud and On-device at the same time, CI/CD for MLOps, ...)

I think one or two lectures with GCP is OK. However, do you think we can get some GCP credit for students?

one possible way is to let students create a free GCP account before the lecture (we can get $300 free credits every time we create a new GCP account)

hunkim commented 1 year ago

@deep-diver Thanks for the outline. Overall, it's OK, but it seems too much for one (two-hour lecture). Can we fully focus on the serving part? Assume we have a wonderful model, for example, a small sentence transformer in this repository.

How to serve? FastAPI, Airflow, KServ, ......

After that, we need to talk about the performance measures. How to measure and what would be the results using a small example?

Then, I would talk about caching, versioning, multi-model serving(+A/B testing), GPU utilization, optimization on different CPU architectures, choosing the right number of threads for message queuing and inter/intra operators, batch inferencing, etc.

After the deployment, what is the next step? This would be a good topic to go over briefly.

I guess @sayakpaul will talk more about the model training side using GCP. Then, could you introduce the GCP a bit and show how to join and get the credit.

Thanks.

hunkim commented 1 year ago

@sayakpaul Could you also outline your two-hour lecture? I really appreciate your help.

hunkim commented 1 year ago

@sayakpaul @deep-diver Are you going to introduce Kubernetes at some point in your lectures?

sayakpaul commented 1 year ago

We both will be covering the serving part.

Kubernetes and Docker will be introduced in the serving lectures. If you want to introduce them in any previous lectures that's fine too.

After that, we need to talk about the performance measures. How to measure and what would be the results using a small example?

After measuring predictive performance common things to measure is latency and throughput via conducting a load-testing. We have experience with that. So, we will introduce it.

I don't think introduction to GCP and how to sign up for free credits should be made a part of the lecture -- rather they should be homeworks for the students. This allows us to get more time to focus on the conceptual things related to serving.

hunkim commented 1 year ago

@sayakpaul So we won't cover training part?

How would you divide the serving parts into two lectures?

Specifically, Nov 4 and 18.

Kubernetes and Docker will be introduced in the serving lectures. Sounds good.

After measuring predictive performance common things to measure is latency and throughput via conducting a load-testing. We have experience with that. So, we will introduce it.

Wonderful!

rather they should be homeworks for the students. This allows us to get more time to focus on the conceptual things related to serving.

+1

sayakpaul commented 1 year ago

How would you divide the serving parts into two lectures?

Sorry for my late reply. I was away for a short trip.

I think model training is a broader topic than serving. I am unable to think of a structure that would be suitable for the lecture series. But if you have ideas we're all ears.

I suggest @deep-diver and I both do the lectures. That way it will be more fun and interesting. Needless to say, it will help with workload distribution too.

deep-diver commented 1 year ago

@hunkim

Since @sayakpaul and I take one or two parts of the MLOps lecture, I think training part is not about the usual training part but re-training or adjustment according to the model or data drift, and possible AutoML/Hyperparameter sweeps (like AutoKeras, KerasTuner) can be integrated into the MLOps system.

hunkim commented 1 year ago

@deep-diver

re-training or adjustment according to the model or data drift, and possible AutoML/Hyperparameter sweeps (like AutoKeras, KerasTuner) can be integrated into the MLOps system.

sounds good.

Can you check out https://github.com/DSA-MLOPS/main/tree/main/main and see if we can use this example for our lectures? For example, deploy it in the Google cloud and add mode advanced servings.

For re-training or adjustment can also be done using this sentence transformer model.

deep-diver commented 1 year ago

It looks like sentence_transformers is built on top of PyTorch. I think we need TensorFlow model for our parts to leverage GCP and TensorFlow Serving. WDYT @sayakpaul ?

Could you @hunkim describe a bit more about the course?

Can we give guest lectures independently from the other lectures? or is the lecture designed to build a system from A-Z that works all together?

I think it is better to keep the lectures not too specific about the model or task.

sayakpaul commented 1 year ago

It looks like sentence_transformers is built on top of PyTorch. I think we need TensorFlow model for our parts to leverage GCP and TensorFlow Serving. WDYT @sayakpaul ?

We can't leverage TF Serving then. We can still use GCP and Vertex AI and other things like GKE. But not TF Serving.

hunkim commented 1 year ago

lectures not too specific about the model or task.

Sure. But for the models, pytorch would be OK. If necessary, we can mirror the same thing (sentence transformer) using TF.

hunkim commented 1 year ago

@sayakpaul @deep-diver Unfortunately, we will open this class next semester (Spring 2023). We will have more time to prepare.

I will let you know when I have a fixed schedule for Spring 2023. Thanks.

sayakpaul commented 1 year ago

Okay.

hunkim commented 1 year ago

@sayakpaul @deep-diver I will offer this course in Spring 2023. May I have the honor of having you as guest lecturers in our class for the following two topics? The lecture time is 2:30 PM KST

(3/31) Google cloud infra (training/serving) (4/14) Docker, k8, KFlow, KServe, Airflow, performance evaluation

sayakpaul commented 1 year ago

@hunkim sorry for my late reply. The first half of this year will be quite busy for me. I will have to pass this one for now.

hunkim commented 1 year ago

@sayakpaul I understand.

Then may I ask if you can talk about your work: Huggingface?

(3/17) Hugging face (TBA)

@deep-diver Do you think you can cover any of these?

(3/31) Google cloud infra (training/serving) (4/14) Docker, k8, KFlow, KServe, Airflow, performance evaluation

deep-diver commented 1 year ago

@hunkim

I think I can cover the basic infrastructure of GCP(i.e. Pipeline, Artifact Store, Training, Serving) and system software(i.e. Docker, Kubernetes) related to MLOps, but I don't think I can cover KFlow, KServe, Airflow.

However, I could cover TensorFlow Extended(TFX) with some use cases. Since TFX is an end to end ML pipeline framework, it covers almost every components of the entire MLOps workflow while lots of other tools try to solve a specific topic. Furthermore, TFX is great when used with GCP.

hunkim commented 1 year ago

@deep-diver wonderful. Can you cover 3/31?

I also saw you are covering stable diffusion with hugging face or something. Do you think we can talk about that in 3/17 if @sayakpaul is too busy to talk about huggingface?

Thanks!

deep-diver commented 1 year ago

@hunkim

Can you cover 3/31?

I think so. Is that going to be held online?

Do you think we can talk about that in 3/17?

I am not really positive on this. SD is somewhat diffucult topic for me to cover

sayakpaul commented 1 year ago

Sure, I can talk about it :) Sorry for my late reply.

@deep-diver and I can share the Google Infra for training / serving.

hunkim commented 1 year ago

@sayakpaul @deep-diver wonderful! I have fixed the schedule. It's all zoom and Friday 1:30PM-4PM HKT.

(3/17) Hugging face (Sayak Paul, Hugging Face) (3/31) Google cloud infra training/serving (Chansung Park/Sayak Paul)

Thanks! I will invite you to google calendar soon.

hunkim commented 1 year ago

@sayakpaul look forward to meeting you next week. Would you mind sharing your lecture title and abstract? It's three hours lecture slot, so you can talk for one hour + some hours of huggingface exercise, or you can use the full three hours.

See you soon.

DSA-MLOPS / main

GCP for serving and training #1