IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 493 forks source link

Docker for production #4665

Closed omaralsoudanii closed 3 years ago

omaralsoudanii commented 6 years ago

Hello , Is there official production ready docker container for data verse ? and if available where can i find the docs? Thank you

pdurbin commented 6 years ago

@omaralsoudanii thanks for opening this issue! It's been nice talking to you over at https://github.com/nds-org/ndslabs-dataverse/issues/8 but this main issue tracker for Dataverse is a better place to discuss the state of Docker support. I can try to explain.

The short answer is that unfortunately, there is no production-ready Docker image for Dataverse as of this writing. I'm still learning Docker and its ecosystem so I'd like to know from you and others who are interested in Docker support how you expect it to work. Do you expect all of Dataverse to be running in a single container? Do you expect some of the components such as PostgreSQL and Solr to be running in different containers? Do you expect to use docker-compose? Do you expect to run Docker images in an orchestration platform such as Kubernetes?

If you look at the source tree, this is what you will find today:

I hope this helps. If you and others interested in Docker support can answer the questions above or other questions I don't even know to ask, it would be very helpful! Thanks!

omaralsoudanii commented 6 years ago

@pdurbin Thanks for this , It would be nice to have 3 containers : 1- Data verse repo container 2- Postgres container 3- Solr container

And a docker-composer.yml file that will build 3 images / 3 containers for those automatically , Also it would be good to include an entry point to manage update scripts to date verse and Postgres sql changes , See https://docs.docker.com/engine/reference/builder/#entrypoint .

Because the problem we're having now is in order to update and maintain data verse we need to adjust DockerFile manually (change data verse instance version) and do the update process manually (run each Postgress schema changes manually ) , So some automated script that takes cares of those would be nice .

Another suggestion would be that the data verse team adjusts the DockerFile configuration each time an update to data verse is released so that we can simply pull docker changes from data verse repo and do :

docker-compose down

docker-compose up --build

For our current version (4.7) though we're trying to update to the latest version (4.8.6) in order to use the license and restrict files native API but i need a fresh data verse Postgres database , Is there a script or sql file for that?

Thank you.

pameyer commented 6 years ago

@omaralsoudanii Any thoughts on an additional container for apache? My understanding is that it's preferable to not expose glassfish/dataverse directly to the outside world, so it might make sense to deal with it at this level of abstraction (but I'm not currently planning on using dockerized dataverse in production, so input from folks that are would be interesting).

vsoch commented 6 years ago

hey everyone! I think I can definitely help with this, and I'd like to suggest starting with a simple docker-compose setup, and then using kompose to transition to kubernetes when the time is right! I'll start poking around this weekend (I'm not familiar with the code base) and then we can loop back next week (when the RedHat intern joins up?) How would you like to have our mentoring / discussion? Github? Slack? Other?

pdurbin commented 6 years ago

@vsoch thanks for the offer to help with this effort! I haven't even met the intern yet but @danmcp mentioned internship goals the other day at http://irclog.iq.harvard.edu/dataverse/2018-05-22#i_67516 . I believe @djbrooke said this individual might be starting as soon as next week and I said I'd be happy to mentor him or her on all things related to Dataverse. I'm happy to help you and others dive in as well, of course!

Perhaps a timeline of events (from my perspective anyway) would help ground us:

I'm probably forgetting people and events but it's a start. In a future comment I'd like to write more about how I can imagine the work for this issue being broken up into smaller chunks.

pameyer commented 6 years ago

@vsoch The idea of starting with docker-compose sounds like a reasonable place to start. There's an example of docker-compose in this repository (although intended for a different purpose) in conf/docker-dcm.

One thing that may be a factor for production-izing Docker/Dataverse (and that @pdurbin and I have talked about) is how to handle the intersection of the docker entry point and the dataverse installer. It's very likely that the approach I've been taking as been sub-optimal for production usage (and so wouldn't make a good prototype for production).

4tikhonov commented 6 years ago

Probably our docker-compose from DataverseEU project can be interesting to try as well: https://github.com/Dans-labs/dataverse-docker/blob/master/docker-compose.yml

vsoch commented 6 years ago

@pameyer agreed I would not have the command coincide with the entire install, so the container can be restarted without prompting it done again!

@4tikhonov the work you've done for DataverseEU looks great! It's 99% there! Is it just a matter of combining the two repos into a consolidated thing with a good set of docs? With your blessings I can give a first shot at this.

4tikhonov commented 6 years ago

Thanks! We're running it with Kubernetes on Google Cloud already http://dataverse-dev.cessda.eu Some documentation for Docker module is available here: https://github.com/Dans-labs/dataverse-docker

vsoch commented 6 years ago

So what needs to be done then?

4tikhonov commented 6 years ago

We're still working on the proxy implementation for multilingual support to get different languages on selected path (/fr for French, for example). First we tried with apache but Google Cloud already has nginx as integrated service running inside of Kubernetes, it seems to be better solution.

pameyer commented 6 years ago

@4tikhonov How did you end up handling the entrypoint / installer issue (run installer in entrypoint, something else)?

4tikhonov commented 6 years ago

@pameyer, I remember that you advised me to do health check before running installer second time: size=$(curl -sI http://localhost:8080/api/info/version | grep Content-Length|awk '{print $2}')

It's in entrypoint indeed, if size > 0 then Dataverse is already installed and further installation steps can be skipped.

pdurbin commented 6 years ago

@4tikhonov I know it's quittin' time for you you but if you're up for a wild Friday night in Dataverse IRC we're chatting now: http://irclog.iq.harvard.edu/dataverse/2018-05-25#i_67861 😄

vsoch commented 6 years ago

What time zone are you guys in? We can arrange a larger group chat maybe after the holiday weekend after I've gotten a chance to look at the code?

On Fri, May 25, 2018 at 11:37 AM, Philip Durbin notifications@github.com wrote:

@4tikhonov https://github.com/4tikhonov I know it's quittin' time for you you but if you're up for a wild Friday night in Dataverse IRC we're chatting now: http://irclog.iq.harvard.edu/dataverse/2018-05-25#i_67861 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IQSS/dataverse/issues/4665#issuecomment-392145252, or mute the thread https://github.com/notifications/unsubscribe-auth/AAxs8sWvP5KxLIdDtibHizbjhLKXS3XVks5t2E93gaJpZM4T8zpS .

-- Vanessa Villamia Sochat Stanford University

vsoch commented 6 years ago

hey everyone I just tested https://github.com/Dans-labs/dataverse-docker and it is a solid start, could someone again tell me why this isn't what you want?

pdurbin commented 6 years ago

@vsoch hi! Thanks for taking a look. Personally, I've been more focused on the OpenShift use case and documented what we've been up to at http://guides.dataverse.org/en/4.8.6/developers/containers.html

I'm sorry to say that I haven't looked closely at the @Dans-labs work by @4tikhonov but I believe @pameyer has so he might be in a better position than I am to comment.

Ideally we'll have a solution that works on:

vsoch commented 6 years ago

I think it would be worth having someone look at his good solution before asking someone else to start over :) My comments would be small tweaks to the containers (e.g., adding env DEBIAN_FRONTEND noninteractive and maybe putting the deps inside the containers to begin with, but I don't think it's a good use of time to start from scratch when this project is going that (I think) is in the right direction.

pdurbin commented 6 years ago

@vsoch that makes total sense. I did take a quick look just now.

@omaralsoudanii since you opened this issue, can you please try https://github.com/Dans-labs/dataverse-docker and provide feedback on if it's going to work for you?

@4tikhonov are they any specific bugs or issues with that repo before anyone should use it in production?

@aculich do you want to give that repo a try as well?

vsoch commented 6 years ago

@4tikhonov I think you've done a great start and I can offer to help, but only if you need/want it.

4tikhonov commented 6 years ago

@vsoch, thanks, we're very indeed interested to get feedback. Our goal is to get it finally integrated with master branch at some point.

pdurbin commented 6 years ago

@4tikhonov is there anything holding you back from making a pull request?

omaralsoudanii commented 6 years ago

@pdurbin @4tikhonov this seems to be working great so far on my local machine ,

One question i have is what's the steps required if say a new dataverse version is up in the repo , Will this docker image auto update the version when using docker-compose build ( without deleting the existing data) or any further steps required ?

I will provide more feedback after testing it with the native API .

Thank you

pdurbin commented 6 years ago

@4tikhonov any thoughts on the question by @omaralsoudanii above?

There's a section of the Installation Guide I wrote called "Choose Your Own Installation Adventure" which is inspired by a series of books I enjoyed as a child. Under "Advanced Installation" I mention community-supported adventures such as https://github.com/IQSS/dataverse-ansible by @donsizemore and it would make sense to me to have the Docker/Kubernetes installation adventure by @4tikhonov fall into the community-supported category as well. That is, @4tikhonov and others in the community could publish Docker images as new releases of Dataverse come out. I'd be happy to give @4tikhonov and others "push" access to a new repo under the @IQSS GitHub organization if that makes sense. Please just let me know!

4tikhonov commented 6 years ago

@omaralsoudanii, it should work for other Dataverse versions as well as latest version should be downloaded to Docker container from GitHub.

@pdurbin, I think it's great idea and I would like to participate in this Docker/Kubernetes installation adventure.

pdurbin commented 6 years ago

@4tikhonov great! What would you like the repo under @IQSS to be called?

I'll set up a team and give you push access. And whoever else you want.

4tikhonov commented 6 years ago

@pdurbin dataverse-docker is fine, thanks, Phil!

pdurbin commented 6 years ago

@4tikhonov sure! I just created https://github.com/IQSS/dataverse-docker and added you to both of these teams:

The repo is empty right now but you should be able to push code there and add more collaborators. Please let me know if you have any trouble. Thanks!

4tikhonov commented 6 years ago

@pdurbin, great, I've placed all stuff there and updated instruction on switching languages in the running Docker container. Tested with versions 4.8.5 and 4.8.6, seems to work.

We'll add Kubernetes specifications later this week.

pdurbin commented 6 years ago

@4tikhonov great! I assigned this issue to you so that when you're ready you can make a pull request against doc/sphinx-guides/source/installation/prep.rst to mention the new dataverse-docker repo as a community-supported effort. Thanks!

omaralsoudanii commented 6 years ago

@4tikhonov FYI on fresh ubuntu 16.04 installation , The current script on

dataverse-docker/dataversedock/step1.sh

line 16 :

unzip glassfish-4.1.zip

requires to install unzip by :

sudo apt-get install unzip

else the initial.bash will fail silently (doesn't add glassfish dependency and docker compose will fail on step 7 )

pdurbin commented 6 years ago

@omaralsoudanii good catch. I would suggest opening an issue at https://github.com/IQSS/dataverse-docker/issues and mentioning which commit you are on.

4tikhonov commented 6 years ago

@omaralsoudanii @pdurbin Yes, great idea! Please open issue there and we'll fix it.

omaralsoudanii commented 6 years ago

@4tikhonov @pdurbin done .

Please note there is a problem with publishing i made an issue there as well. https://github.com/IQSS/dataverse-docker/issues/2

4tikhonov commented 6 years ago

@omaralsoudanii Yes, the default Docker image is coming without credentials for DOI service. Or do you mean that you want to test it locally with some test DOI prefix?

omaralsoudanii commented 6 years ago

@4tikhonov trying to test it with the default DOI prefix ( doi:10.5072) that comes with the dv installation , Currently tried it both locally and online

pdurbin commented 6 years ago

@omaralsoudanii any news? It looks like @4tikhonov fixed both of the issues:

pdurbin commented 6 years ago

@omaralsoudanii did you get a chance to test this?

omaralsoudanii commented 6 years ago

@pdurbin @4tikhonov we published the docker repo on a production environment for our app (http://data.mel.cgiar.org) with a handle server for publishing , So far using the native API it looks great after testing , Thank you.

pdurbin commented 6 years ago

I'm just copying a comment from @xibriz from https://github.com/IQSS/dataverse-docker/issues/3#issuecomment-415321082 who said, "I have just made a successful deployment on our Kubernetes cluster using GitLab CI, so thanks for the great work :)"

@omaralsoudanii thanks for testing as well!

I'm wondering what the "definition of done" is for this issue. Currently, it's sitting in the new "Community Dev" column on our kanban board:

screen shot 2018-08-23 at 7 03 45 am

That's fine. It can continue to sit there. I'm just wondering what it means to be "done" and for this issue to be closed. Thoughts on this are very welcome!

xibriz commented 6 years ago

Just to clarify, I had to make a few changes to be able to use GitLab CI as a deployment method.

But there are some things I personally think needs to be done. Like using Secrets that all the containers in one deployment can use instead of passwords in files. I personally also need to take a closer look at Persistent Volumes and how to best upgrade the database on new releases.

pdurbin commented 6 years ago

@xibriz make sense. Thanks. Over in pull request #4827 for #4763 a similar effort was undertaken to get secrets out of the OpenShift config. Please keep us posted on your experiments with peristent volumes and and upgrades. Speaking of upgrades, you (and all who are reading this) are welcome to leave a comment on #4980.

pdurbin commented 6 years ago

@poikilotherm over at http://irclog.iq.harvard.edu/dataverse/2018-10-18#i_76326 we decided to add you as an assignee for this issue so you can coordinate with @4tikhonov . Thanks!

4tikhonov commented 6 years ago

Hi @poikilotherm and @pdurbin, we already have complete infrastructure in Kubernetes: https://github.com/IQSS/dataverse-docker/tree/master/kubernetes

It's actually running on Google Cloud for DataverseEU. You can get all Docker images from Docker Hub: https://hub.docker.com/r/vtycloud/dataverse/ https://hub.docker.com/r/vtycloud/postgres/ https://hub.docker.com/r/vtycloud/solr7

pdurbin commented 6 years ago

@4tikhonov that's awesome. Over on the "Advanced Installation" at http://guides.dataverse.org/en/4.9.4/installation/advanced.html we say, "Advanced installations are not officially supported but here we are at least documenting some tips and tricks that you might find helpful." Are you interested in making a pull request to add your Kubernetes and Docker information there? The file to edit is doc/sphinx-guides/source/installation/advanced.rst.

4tikhonov commented 6 years ago

hi @pdurbin, that's great! The only problem for me is migration from 4.9.2 to 4.9.4 on Cloud as it didn't went smoothly. If people will migrate from previous versions without applying patches we'll get into troubles with distribution of 4.9.4.

pdurbin commented 6 years ago

Yep, over at #5204 I asked you for some output from asadmin deploy.

If you need any help making a pull request for the Advanced Installation page, please let us know. You might find http://guides.dataverse.org/en/4.9.4/developers/documentation.html helpful but it's a bit out of date.

Thanks!

pdurbin commented 5 years ago

I just wanted to note that @craig-willis (the first to dockerize Dataverse as noted in the timeline above) said over at https://github.com/whole-tale/whole-tale/issues/49 that he's reading through this issue and we should help him learn about latest activity with regard to Dataverse and Docker.

craig-willis commented 5 years ago

Thank you, @pdurbin. I'm always amazed by your ability to keep things connected! Forgive me for the lengthy comment, and hopefully this is the right place to put it.

The use case for whole-tale/whole-tale#49 is that I want to quickly spin up an instance of Dataverse to test API integration and external tools support. As you noted in https://github.com/whole-tale/whole-tale/issues/49#issuecomment-438299326, there are other ways to work with development instances of Dataverse: https://demo.dataverse.org, https://dev1.dataverse.org and I could also use docker-aio, but I'm also motivated to freshen up Dataverse support in Labs Workbench so that others can easily do this too. I am interested in contributing in this space and want to make sure I'm putting my effort in the right place.

There seem to be two different but related needs here:

  1. A set of well-maintained images ideally with automatic builds associated with each release that can be used to spin up dev and kick-the-tires instances on OpenShift, Labs Workbench, Kubernetes, Swarm, Docker compose etc. These don't need to be production configurations (i.e., scalable/sharded, etc), but should work out of the box with reasonable default settings, persistent storage, and at minimum be restartable.
  2. A set of production images and configurations (scalable, reliable, etc) for use on common orchestration platforms (e.g., Kubernetes, Swarm, etc) that addresses the known limitations and other issues proposed in https://github.com/IQSS/dataverse/issues/5292. (Perhaps even installable via a Helm chart?)

Here is my current understanding of Docker support in the Dataverse community:

Based on your comment https://github.com/whole-tale/whole-tale/issues/49#issuecomment-439386207 I gather that in core Dataverse development, Docker is primarily used for testing -- this is not part of your production infrastructure. The DataverseEU group is deploying on GCE via Kubernetes and therefore leading the more production-oriented work today.

I am unclear on the future of IQSS/dataverse-docker and IQSS/dataverse/conf/docker, since they are both quite similar. One is maintained by community members intended for real/production deployments, the other is maintained as part of the core project repository, but only intended for tire kicking and testing.

My tentative conclusion is that I should focus on IQSS/dataverse-docker and contribute any changes there, with the hope that these two eventually converge.

Did I miss anything?

pdurbin commented 5 years ago

@craig-willis you have it 95% right. Excellent summary. Thank you!

Unfortunately, you are assuming a level of support and maintenance for OpenShift-based images in "conf/docker" that does not exist. Some refactoring of our installation process to better support OpenShift occurred in pull request #4805 but parts of our traditional installation process broke so we followed up with pull request #5058 to make sure traditional installation works. This is to say that I have no idea if the OpenShift works on the develop branch right now. It's not regularly tested. I'm not sure when the OpenShift effort will resume.

I do think that you should focus on https://github.com/IQSS/dataverse-docker because that's where the momentum is. That said, I'm not sure if anyone is using it production or not. If they're in production, I don't believe they're on our map at https://dataverse.org