KTH / devops-course

Repository of the DevOps course at KTH Royal Institute of Technology DD2482
191 stars 427 forks source link

Executable tutorial proposal #2519

Closed kthfre closed 1 month ago

kthfre commented 2 months ago

Assignment Proposal

Title

End-to-end training of a neural network to deployment in a live application

Names and KTH ID

Deadline

Category

Description

I'd like to make an executable tutorial that goes through the training of a neural network in a Jupyter notebook on Colab, handling the intermediary steps, and deployment to some live application, so the end-to-end process. I'd put limited focus on the ML aspects and greater focus on the DevOps aspects. I'd like to whip together my own functionality for the DevOps parts, if I may, as it's a fun learning experience and could be meaningful scripts for future usage. The deployment criteria for the model could be to exceed previous test data accuracy, but there could also be any other reasonable criteria. I haven't fully decided on the functionality for the MLops/DevOps part. The bare minimum is actually deploying the model live when fulfilling the criteria. Other things being considered are model storage/rollback, job scheduling/queue in running notebooks, monitoring of multiple notebooks, etc.

Architecture wise there would be:

I asked TA about this briefly in a lab session (not previous, but one before that) and it sounded OK. I meant to register it earlier, but other coursework came in between. I think it's still OK to register an MLops task since it's asynchronous and there is no "week" folder structure in the directory tree. So if it is, and the proposal sounds OK, is all I have to do commit to a deadline and deliver?

Relevance

Jupyter Notebook/Lab is often used for processing, preparing, and visualizing data, as well as subsequently training machine learning models. The process of deriving a model is often an iterative process to determine suitable model architectures and optimal hyperparameters. Models may furthermore require continuous altering after deployment as more data becomes available or use cases change. This process is presumably often done manually, particularly as data scientists and conventional developers may be different teams, but there are clear benefits in automating the process.

Submission: Github repository

algomaster99 commented 2 months ago

@kthfre the proposal is good, but somehow this PR is not add your proposal to the contributions folder. Could you fix it?

kthfre commented 1 month ago

@algomaster99 I think that is fixed now, but now the check "update grading in canvas" fails. Should I just create a new PR?

algomaster99 commented 1 month ago

@Deee92 any idea why this task is failing? Do we have to update CANVAS_TOKEN or GH_TOKEN? It seems to work for other PRs.

algomaster99 commented 1 month ago

@kthfre if you are not far into the executable tutorial implementation, could you please look for a partner?

kthfre commented 1 month ago

@algomaster99 Unfortunately I'm like 95% done on the implementation part.

algomaster99 commented 1 month ago

Okay, no problem then. I will let Deepika look into this issue and then merge.

kthfre commented 1 month ago

@algomaster99 The links are available in the readme of the repository which I linked above.

algomaster99 commented 1 month ago

Thanks @kthfre ! Could you please edit the proposal by submitting a PR?

kthfre commented 1 month ago

@algomaster99 Done. Sorry about the delay here.