cdfoundation / sig-mlops

CDF SIG MLOps
https://cd.foundation
Apache License 2.0
597 stars 68 forks source link

Jupyter Notebooks - Roadmap Discussion #8

Closed hamelsmu closed 4 years ago

hamelsmu commented 4 years ago

Thanks for the comprehensive roadmap, @tdcox! I have one question in the technical requirements section:

Educating data science teams regarding the risks of trying to use Jupyter Notebooks in production

Can you expand on this a bit more? I have found it difficult to make an argument that data scientists should not use their most beloved tool when crafting their deliverables. There are some tractable ways that data scientists can reliably use notebooks in a production workflow:

I am not sure if this is what you meant, but I wanted to pause and have a discussion on this point. I am still reviewing the rest of the document, but I figured I should bring this up since it caught my attention.

cc: @jlewi, @aronchick

tdcox commented 4 years ago

People like experimenting with Jupyter Notebooks because this is a tool that has been optimised for rapid prototyping. To make that optimisation, most of the non-functional requirements relating to operating the code in distributed production environments were discarded.

When you prototype something with Jupyter Notebooks, you should be aware that you are accelerating the pace of demonstrating one element of functional behaviour at the cost of disregarding all the non-functional elements that also make up a ‘whole product’.

Typically, the commercialisation of technology products follows a number of phases. Initial prototyping is done to create a proof of concept to demonstrate the technical viability of the core technology, then a ‘production engineering’ phase takes that prototype and re-architects it such that it is secure, robust, reliable, scalable, maintainable, flexible, testable, auditable, compliant with legislation etc. Only after this is it both safe to put that product into production, and cost-effective to own it.

Traditionally, this has been done by separate teams, creating a slow and laborious process with many governance gateways and opportunities for miscommunication.

The DevOps methodology improves productivity and lowers cost of ownership for conventional software assets by optimising the end-to-end process of commercialising an idea. Machine Learning models are just another form of software asset with a different specialisation and so the purpose of the MLOps Roadmap is to extend the DevOps approach to include ML assets in order to optimise for AI product commercialisation rather than just model prototyping.

This is particularly important for ML, since the majority of commercial AI solutions are decision-making systems rather than decision-support systems and therefore can be expected to be both highly regulated and more likely to incur the risk of litigation should they fail to meet appropriate quality standards.

Terry Cox Bootstrap Ltd

On 17 Jan 2020, at 23:40, Hamel Husain notifications@github.com wrote:

 Thanks for the comprehensive roadmap, @tdcox! I have one question in the technical requirements section:

Educating data science teams regarding the risks of trying to use Jupyter Notebooks in production

Can you expand on this a bit more? I have found it difficult to make an argument that data scientists should not use their most beloved tool when crafting their deliverables. There are some tractable ways that data scientists can reliably use notebooks in a production workflow:

nbdev Netflix's ML Infra on Notebooks I am not sure if this is what you meant, but I wanted to pause and have a discussion on this point. I am still reviewing the rest of the document, but I figured I should bring this up since it caught my attention.

cc: @jlewi, @aronchick

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

hamelsmu commented 4 years ago

Thanks for expanding on your thinking, @tdcox! That was helpful to know the various concerns. I certainly wish the Jupyter-to-production workflow was more mature, and there are some promising tools that have been designed as of recent that address some of these limitations. '

However, my favorite tools for doing this converts notebooks to scripts behind the scenes for similar reasons you described (to take advantage of the full suite of devops tools), so in that sense I agree with you.

Thanks again for the detailed writeup and roadmap!