Docker infrastructure - Githubissues

alan-turing-institute / uatk-spc

Synthetic Population Catalyst

https://alan-turing-institute.github.io/uatk-spc/

MIT License

21 stars 12 forks source link

Docker infrastructure #37

Open darribas opened 2 years ago

darribas commented 2 years ago

Opening separate issue from #33 to discuss and deploy a working model for the "official" SPC image for folks to use. This builds on discussion started on https://github.com/alan-turing-institute/uatk-spc/pull/34#discussion_r917915698 .

Current thinking (to progress on discussion):

Docker details should not go into main README
A better place could be the Installation section

Before done, we at least need to:

[ ] Figure out a working model to host on Docker Hub
[ ] Update documentation on the SPC website on how to run SPC through Docker
[ ] Deploy working image to Docker Hub
[ ] Hook the image into Binder on this repo

Anything else I'm missing?

mfbenitezp commented 2 years ago

@darribas I think we havent discuss what other packages can be included in that Docker, maybe a brief discussion about the "essential" toolkit to Explore, Analyse and Map the .pb files help us also to define what we should include in the updated Docs.

darribas commented 2 years ago

yes, that's a great idea, I originally included those that'd allow the user to replicate the docs pages:

https://github.com/alan-turing-institute/uatk-spc/blob/c5f2833718d4fcebab5d3b8e7749bf5b166f319e/Dockerfile#L24-L28

A bit of group discussion as to whether that is correct or we want to expand a bit further would be great.

mfbenitezp commented 2 years ago

My view is, tools like Jupyter Notebook, might provide the experimental toolkit that any data scientist can use to play with the output, in previous discussions the in that the framework or tools was not part of the core of SPC. Now after my experience in the ASG event and other questions, I vote to include in the docker some king of "essential" toolkit to Explore, Analyse and Map the output" (in additional to your suggestions, I'd add Jupyter Lab). Including some examples (Or the already scripts created by Dustin or others we could have) to provide guidance on how to use it. We are claiming easy to use, then I think it falls in our responsibility some kind of teaching/guidance about how to 1) extend the data schema ( we could use some of the use cases we have been included) and 2) How to use it the .pb file in external models ( ASPICS, and other). Happy to discuss this in our meetings or chats.