IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 493 forks source link

Docker for production #4665

Closed omaralsoudanii closed 3 years ago

omaralsoudanii commented 6 years ago

Hello , Is there official production ready docker container for data verse ? and if available where can i find the docs? Thank you

craig-willis commented 5 years ago

@pdurbin -- Well, 5% wrong isn't too bad, but at least it was an optimistic assumption!

Some refactoring of our installation process to better support OpenShift occurred in pull request #4805 but parts of our traditional installation process broke so we followed up with pull request #5058 to make sure traditional installation works.

Good to know. In my early work, I terribly hacked the installation process, which was functional for a proof-of-concept but entirely unmaintainable. Even if Docker isn't supported as part of the core project, it seems that the community would need at a minimum a Docker-compatible installation process. I expect you'd be receptive to PRs as long as they don't break the main installer?

pdurbin commented 5 years ago

@craig-willis yep, pull requests against the installer are always welcome. Our official statement on it is at http://guides.dataverse.org/en/4.9.4/installation/installation-main.html#running-the-dataverse-installer and says:

"The script is to a large degree a derivative of the old installer from DVN 3.x. It is written in Perl. If someone in the community is eager to rewrite it, perhaps in a different language, please get in touch. :)"

A couple other factoids about the installer is that the Perl script is not used by https://github.com/IQSS/dataverse-ansible and lately @poikilotherm has been advocating for using the "MicroProfile Config API" in #5293 but this would require switching from Glassfish to Payara #4172. I don't know. it's complicated. Most of the installer hackers hang out in #dataverse on freenode if you ever want to pick some brains.

poikilotherm commented 5 years ago

Hi @craig-willis,

@poikilotherm's recent work on #5292, which if I'm understanding is intended to further refine the build and install process to be more micro-service friendly.

Actually #5292 is not just about building and installation, but also a lot about testing. Personally, I try to make small steps towards the ability to introduce things like Arquillian, etc. in the build chain on the hand and on the other hand let people run a simple mvn docker:run to spin up a test instance in their current branch. Here "testing" means not just executing JUnit or similar tests to me, but also things like UI building or demos. (And of course I want to avoid breaking existing ways of doing stuff, because that makes people upset.)

What I pretty much dislike about the current approaches for Docker/OpenShift/... images: the big and clunky installer script is used in them. IMHO that is not what a Docker setup should look like; for me it feels more like packaging a lightweight VM to call docker run instead of the install script. Again, this is just my understanding how modern webservices should work and it might differ from others. And I am very aware of changing this will need a lot more work than writing some Dockerfiles, which is why I came up with #5292.

I can't make suggestions where an investement of time and ressources will get the most in return for your own work and visions. #5292 needs people who share my vision for the long run and this needs to be shared, explained and discussed with people like @pdurbin, @pameyer, @matthew-a-dunlap, @scolapasta and anyone else who would like to join. (After all I am not part of @IQSS or Harvard and they need to get stuff done. :wink:)

poikilotherm commented 5 years ago

Hey all, I encourage anyone to check out my simple and general purpose Kubernetes stuff at https://github.com/iqss/dataverse-kubernetes.

This is still WIP, as I want more stuff inside, but the groundwork is done. Once things get moving in #5292 and #4172 the images will change. Any comments appreciated, feel free to open an issue in the project with feedback, wishes, bugs, ...

vsoch commented 5 years ago

@poikilotherm I have a practical question - do you spin up a cluster (and then have to pay for it) on a cloud provider to test? I'm trying to come up with reasonable ways to (cost effectively) develop with k8s but it seems this is the only way to go... minikube it's really the same thing.

pdurbin commented 5 years ago

@vsoch OpenShift is a flavor of k8s that offers a free tier. It turns out Dataverse is too fat memory wise to squeeze into it but it might be worth a look. Back when I was working on #4040 the free tier allowed for 1 GB of memory. I'm talking about "OpenShift Online", the service hosted by Red Hat: https://www.openshift.com/products . My dream was that people interested in installing Dataverse would be able to just click a couple buttons in OpenShift Online and have a free deployment to kick the tires on. Oh well.

If there is other free Kubernetes hosting out there (perhaps for open source projects?) I'd be interested as well.

Since Dataverse developers have access to AWS, I've been thinking it would be nice to try @poikilotherm 's new dataverse-kubernetes repo on Amazon Elastic Container Service for Kubernetes (EKS).

vsoch commented 5 years ago

Yeah this has been my experience too :/ I'm not sure it's the greatest thing that software development is being driven toward "special" infrastructure that is hard to come by.

pdurbin commented 5 years ago

@vsoch I was reminded by @pameyer (thanks) to mention that NDS Labs Workbench ( https://www.workbench.nationaldataservice.org ) is a way to spin up projects like Dataverse, DSpace, iRods, OwnCloud, etc. (full list at https://nationaldataservice.atlassian.net/wiki/spaces/NDSC/pages/4358234/NDS+Labs+Services ) for free if you are eligible ("Data management tool developers" and others according to https://nationaldataservice.atlassian.net/wiki/spaces/NDSC/pages/18448399/Frequently+Asked+Questions ). @craig-willis developed this and you can read more about it from the Dataverse perspective in #4152 and at http://guides.dataverse.org/en/4.11/installation/prep.html#nds-labs-workbench-for-testing-only

@poikilotherm I'm a little blocked trying out dataverse-kubernetes in the cloud because I need to learn how to use EKS. In this issue I just opened, I'm asking if some tips can be be added to the README: https://github.com/IQSS/dataverse-kubernetes/issues/12

@4tikhonov does https://github.com/IQSS/dataverse-docker run on EKS? I'd be happy to open an issue to add some tips to the README there as well, if you want.

vsoch commented 5 years ago

Holy cow @pdurbin you just won the award for the highest density of references and links I've ever seen in a single comment, anywhere! :confetti_ball: Thank you! This looks like an awesome resource and I'll check it out!

pdurbin commented 5 years ago

@vsoch unfortuntely at https://github.com/IQSS/dataverse/issues/4152#issuecomment-479037624 we learned that the instance of NDS Labs Workbench we've been talking about won't be supported any more. The code remains open source if someone else wants to run it.

Meanwhile, some excellent progress was made by @poikilotherm yesterday toward deploying his community-supported dataverse-kubernetes effort on AWS. Even though in https://github.com/IQSS/dataverse-kubernetes/issues/12 I was asking for EKS support (also discussed at http://irclog.iq.harvard.edu/dataverse/2019-04-24#i_91578 ), all I think I really wanted was a way to deploy his solution on AWS somehow. In pull request https://github.com/IQSS/dataverse-kubernetes/pull/45 he has created what looks like fantastic documentation on how to run dataverse-kubernetes on AWS using kops. I haven't tried it myself but from what I hear at http://irclog.iq.harvard.edu/dataverse/2019-04-25#i_91713 it sounds awesome. All you need is an AWS account, which we also document at http://guides.dataverse.org/en/4.13/developers/deployment.html . I haven't tried it myself but I'd like to encourage anyone reading this to try out the readme in that pull request and report back here any feedback. From my perspective, this unblocks this issue, potentially at least. I don't believe anyone is using dataverse-kubernetes in production yet, but @poikilotherm plans to.

Additionally, @4tikhonov recently tweeted at https://twitter.com/4tykhonov/status/1116229640232873984 some slides regarding his dev process which makes use of https://github.com/IQSS/dataverse-docker and a continous integration pipeline. You can see some more discussion about this (and screenshots) at https://github.com/IQSS/dataverse/issues/5725#issuecomment-482083170 and https://github.com/IQSS/dataverse/pull/5751#issuecomment-482316168 . I don't believe anyone is using dataverse-docker in production yet. I think it's mostly used for (fantastic) demos. I'd be interested in knowing how easy it is to deploy dataverse-docker to AWS since that's the cloud resource that's available to me.

@omaralsoudanii what is your status, please? Have you been experimenting with running Dataverse in Docker?

@xibriz I recently learned that @poikilotherm is also using GitLab CI. You two might want to coordinate.

xibriz commented 5 years ago

@pdurbin Since you have an incredible overview of everything, I have to let you know that I will leave my job at UiT in about a month.

As far as I know, no one will be taking over the Kubernetes-trials at UiT :/

pdurbin commented 5 years ago

@xibriz thanks for the update! Good luck on your future adventures! I'd love to see you in Cambridge again some day! 😄

vsoch commented 5 years ago

@pdurbin thanks for letting me know - it's an incredible (and costly) effort to provide this kind of resource and I understand not being able to support it forever. It's really fantastic to see all the great work on Dataverse! Heads up @poikilotherm the link at the top of your repo here is 404.

I still have yet to figure out how to easily develop with an actual cluster (without the immense burden of costs) but I'm learning a lot of GoLang and working on supported tools to hopefully learn a bit regardless. Definitely ping me if I can be of any help, other than offering words of encouragement! :)

poikilotherm commented 5 years ago

@vsoch seems like the link I used was accidentially a link you can only follow when accessing logged in... Thx for pointing that out, will change it.

vsoch commented 5 years ago

I'm not worthy!!!

Just kidding :)

pdurbin commented 5 years ago

@omaralsoudanii are you still interested in this? There are some community efforts to run Dataverse on Docker in production but it is not supported by IQSS.

pdurbin commented 4 years ago

The first installation of Dataverse to advertise itself as running on Kubernetes in production was just added to the map at https://dataverse.org/installations

FZJ-on-Dataverse

@poikilotherm is my hero! 🎉 He recently gave a talk at http://talks.bertuch.name/dataverse-k8s-20200124/ about running Dataverse on Kubernetes:

Screen Shot 2020-02-28 at 6 12 25 AM

Awesome.

portante commented 4 years ago

The first installation of Dataverse to advertise itself as running on Kubernetes in production was just added to the map at https://dataverse.org/installations

Great news!

4tikhonov commented 4 years ago

Great, congratulations, @poikilotherm! I'm also convinced that Kubernetes is the only way to go, all services should follow the same direction to increase the maturity, not only Dataverse.

pdurbin commented 4 years ago

@portante thanks for the shout out at https://twitter.com/pportante/status/1233367941346885633 !

When should we circle back to #4040 ? 😄

pdurbin commented 4 years ago

@4tikhonov I absolutely agree!

At @pidapalooza 2020 someone told me, "For anything new, it has to run on Kubernetes."

Hmm, come to think of it... what's the definition of done for this issue? 😄

poikilotherm commented 4 years ago

@pdurbin asked me to link to IQSS/dataverse-kubernetes#129 here because of the first bullet in #4040 being about running things on OpenShift. @portante is this still relevant for you? Phil is really eager to chat with you on IRC and I'd be happy to share thoughts and ideas.

4tikhonov commented 4 years ago

@4tikhonov I absolutely agree!

At @pidapalooza 2020 someone told me, "For anything new, it has to run on Kubernetes."

Wait a little bit and I'm pretty sure we can bring really big fish to the Dataverse community. It's also providing previewers and other services running on Kubernetes, as a part of this POC: https://twitter.com/4tykhonov/status/1232276978021126144

poikilotherm commented 3 years ago

@pdurbin vote to close. With https://github.com/gdcc/dataverse-kubernetes and https://github.com/gdcc/dataverse/tree/develop+ct we should have sufficient things in place. Dunno if https://github.com/IQSS/dataverse-docker is made for production.

djbrooke commented 3 years ago

Thx @poikilotherm and others for all of the work on various solutions in the container space. I'll close this as we already have Docker and Kubernetes on https://guides.dataverse.org/en/latest/developers/containers.html and we can add more community solutions as they reach maturity.