Azure / Azure-TDSP-Utilities

Utilities and scripts developed as part of Microsoft's Team Data Science Process for productive data science
Creative Commons Attribution 4.0 International
373 stars 275 forks source link

Lifecycle in Data Science #46

Open Gautamshahi opened 4 years ago

Gautamshahi commented 4 years ago

Hi,

I wanted to know, where do you include the testing in the data science because for the robust system dealing with several possibilities we need to test our model

Compare to traditional software lifecycle, how do you correlate with the data science life cycle?

iboulahna commented 4 years ago

Hi @Gautamshahi, I think the testing can be performed in the deployment. image Team Data Science Lifecycle (Microsoft, 2017).

I'm using flask APIs to deploy my models in the company where I work now, and before the model production I have to test if the Post Reauests are working, etc. Then, when all the technical problems are resolved, I have to do some quality check for the models, it takes some times one year to validate if a model is performing good on the production data.

Gautamshahi commented 4 years ago

Hello,

@iboulahna, Thank you very much for giving an overview of the life cycle.

Can you please share some tutorials or GitHub project or article which describe the deploying the model using flask API?

What kind of test do you perform in post request like checking a different set of data?

Can you please the some of the quality checks which we can perform before the deployment?

Regards,

iboulahna commented 4 years ago

Hi @Gautamshahi,

You can visit this github repo for model deployment using flask: https://github.com/tanujjain/deploy-ml-model. I

f you have the possibility to use Microsoft services, you can also use: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/ci-cd-flask

In the data sent via post request I check if it is the same used in the training, let's say for example you used V1 as integer in your trained model, so when you are predicting you have to be sure that V1 is sent via post request as integer.

I'm working on credit risk use cases so the quality check or post production evaluation is basically about how the model detect the credit default, fraud, etc so that there is less impact on the business and the risk is minimized. For this the precision/recall are used on the new data sets (say for example 6 months after deploying the model). It really depends on each business problem, and if you are new in data science you will hear a lot of "It depends" as answer (data science depends a lot on your business problem/use cases, and that's what makes it fun and challenging in the same time).

I hope my answer was helpful.

Gautamshahi commented 4 years ago

Hello,

Thank you very much for sending the details.

Its really help to understand.

Regards,

On Wed, Mar 4, 2020 at 12:52 PM iboulahna notifications@github.com wrote:

Hi @Gautamshahi https://github.com/Gautamshahi,

You can visit this github repo for model deployment using flask: https://github.com/tanujjain/deploy-ml-model. I

f you have the possibility to use Microsoft services, you can also use:

https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/ci-cd-flask

In the data sent via post request I check if it is the same used in the training, let's say for example you used V1 as integer in your trained model, so when you are predicting you have to be sure that V1 is sent via post request as integer.

I'm working on credit risk use cases so the quality check or post production evaluation is basically about how the model detect the credit default, fraud, etc so that there is less impact on the business and the risk is minimized. For this the precision/recall are used on the new data sets (say for example 6 months after deploying the model). It really depends on each business problem, and if you are new in data science you will hear a lot of "It depends" as answer (data science depends a lot on your business problem/use cases, and that's what makes it fun and challenging in the same time).

I hope my answer was helpful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/Azure-TDSP-Utilities/issues/46?email_source=notifications&email_token=AA6IKFUEOJJNE5DCGCJLAHTRFY6JVA5CNFSM4KQLQJZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENXQA5Y#issuecomment-594477175, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6IKFQHOYMLH2ZZSRNNPBDRFY6JVANCNFSM4KQLQJZA .

-- Gautam Kishore Shahi,