IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
873 stars 484 forks source link

Get Dataverse running on OpenShift (Docker and Kubernetes) #4040

Closed pdurbin closed 5 years ago

pdurbin commented 7 years ago

Yesterday I met with @portante and @danmcp and talked a fair amount about the possibility of getting Dataverse running on OpenShift. There are multiple reasons why I'm interested in this:

Getting Dataverse running on Openshift isn't on our roadmap so I've created this issue so we can estimate it in sprint planning or backlog grooming. Anyone reading this is very welcome to leave comments or ask questions!

pdurbin commented 7 years ago

I mentioned this issue to @bjonnh yesterday in chat because I've tagged him as the primary contact for the dev effort by the community to work on Docker support in #3938. He said that free OpenShift accounts would be helpful for him, @donsizemore, and anyone else who wants to help with the effort to get Dataverse running on OpenShift.

danmcp commented 7 years ago

You can get a free account with the starter tier:

https://www.openshift.com/pricing/index.html

That gets you 1GB of free memory. You can also use oc cluster up

https://github.com/openshift/origin/blob/master/docs/cluster_up_down.md

or minishift:

https://www.openshift.org/minishift/

to run OpenShift on your laptop.

danmcp commented 7 years ago

Rules for writing good images:

https://docs.openshift.org/latest/creating_images/guidelines.html

How to set the memory based on the cgroup:

There is an example in here:

https://blog.openshift.com/managing-compute-resources-openshiftkubernetes/

Look under "Writing Applications". And here is an example from mysql:

https://github.com/sclorg/mysql-container/blob/master/5.5/root/usr/bin/cgroup-limits#L48

Our postgresql image:

https://hub.docker.com/r/openshift/postgresql-92-centos7/

Example templates:

https://github.com/openshift/origin/tree/master/examples

hello-openshift is a fine place to start and move on to sample-app

pdurbin commented 7 years ago

@danmcp thanks!

@portante @danmcp @landreev @scolapasta and I had a great meeting today. Here's a picture of the whiteboard:

img_20170907_130127

@danmcp already did his to do list items and my todo list item is to take the latest images from https://hub.docker.com/r/ndslabs/dataverse/ and reference them in a new file at conf/openshift/openshift.yaml ( @danmcp I'm seeing a YAML example at https://docs.openshift.org/latest/dev_guide/templates.html#writing-templates but not in https://github.com/openshift/origin/tree/master/examples/hello-openshift )

Basically, we'll be trying to see what breaks when we try to deploy the NDS Labs images "as is" to OpenShift. From the whiteboard, we'll need to dig into these questions about the DNS images in order to make sure they run on OpenShift:

We are deferring the following concerns until the future:

Basically, the definition of done for this issue is that someone interested in kicking the tires on Dataverse for non-production use will be able to spin it up for the free 1 GB Openshift "starter" plan. The whiteboard drawing offers some clues on what the pull request might look like. In our conf directory, we'll have a Dockerfile each for Solr, PosgreSQL, and Dataverse+Glassfish. We'll have a build script to create the images and push them to DockerHub (I'll create an account for IQSS). We'll have the Openshift YAML file I mentioned above. We'll have some docs for people who want to kick the tires on Dataverse.

@danmcp I swung by @djbrooke 's office and we'd like to figure out when a good time to put this into a sprint would be. The first available would start next Wednesday, Sep 13 and go for two weeks. Let's not pick a time when you're on vacation! 😄

danmcp commented 7 years ago

@pdurbin Many of the examples are in json. The templates can be either.

I should be around most of the time over the next few weeks.

pdurbin commented 6 years ago

@danmcp awesome. Today I signed up for an OpenShift account and went through https://docs.openshift.com/online/getting_started/index.html . That doc is slightly out of date and I got some weird errors along the way (I grabbed a screenshot if you want it) but eventually they resolved themselves and I could see at http://nodejs-mongo-persistent-pdurbin-example.1d35.starter-us-east-1.openshiftapps.com the simple change I made at https://github.com/pdurbin/nodejs-ex/commit/c6efab921660d898e7beb9fc8cb5e1094c31c16e . Great.

I looked at https://github.com/openshift/origin/blob/v3.7.0-alpha.1/examples/hello-openshift/hello-project.json and noticed that there were no containers in there so I added a containers array under "spec" and include some images from NDS Labs.

I'm currently blocked on the error "cannot create projects at the cluster scope" and left a note about this d287772 which is the first commit of a new 4040-docker-openshift branch I pushed to this repo. Can you please take a look at that commit and let me know what I'm doing wrong? Thanks!

danmcp commented 6 years ago

@pdurbin In your case, you're not going to want to create a project but rather import into an existing project. You should have a kind of template like this one:

https://github.com/openshift/origin/blob/master/examples/sample-app/application-template-stibuild.json

pdurbin commented 6 years ago

@danmcp thanks, in 77b3f67 I switched from "Project" to "Template" and stubbed out in the Dataverse dev guide how to use Minishift, which I just installed and have been playing with (with some guidance from @pameyer ). I was able to expose a route but I'm not sure how to expose the Docker image at https://hub.docker.com/r/ndslabs/dataverse/ within my installation of Minishift. Any advice?

danmcp commented 6 years ago

@pdurbin You will just reference ndslabs/dataverse from an imagestream like this:

https://github.com/openshift/origin/blob/master/examples/sample-app/application-template-stibuild.json#L83

Then from your container you would reference the imagestream like this:

https://github.com/openshift/origin/blob/master/examples/sample-app/application-template-stibuild.json#L247

with the name of the image stream you picked.

pdurbin commented 6 years ago

@danmcp thanks! I tried at https://github.com/pdurbin/dataverse/commit/e1e492f56aa9ba81ee84761acc96b193bc060ad3 (pushed to my personal repo this time because of the error below) but I got a crazy error:

murphy:dataverse pdurbin$ oc new-app conf/openshift/openshift.json 
--> Deploying template "project1/dataverse" for "conf/openshift/openshift.json" to project project1

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x90 pc=0xe4846d]

goroutine 1 [running]:
panic(0x33760e0, 0xc420010080)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/openshift/origin/pkg/util.addDeploymentConfigNestedLabels(0xc420640d20, 0xc42115eab0, 0x2, 0x0, 0xc42115ebd0)
    /go/src/github.com/openshift/origin/pkg/util/labels.go:181 +0x2d
github.com/openshift/origin/pkg/util.AddObjectLabelsWithFlags(0x5ba8fe0, 0xc420640d20, 0xc42115eab0, 0x2, 0xc420640d20, 0x0)
    /go/src/github.com/openshift/origin/pkg/util/labels.go:44 +0x846
github.com/openshift/origin/pkg/cmd/cli/cmd.hasLabel(0xc42115eab0, 0xc420ea12c0, 0xc421155a28, 0xc421155a18, 0xc42115eab0)
    /go/src/github.com/openshift/origin/pkg/cmd/cli/cmd/newapp.go:580 +0x102
github.com/openshift/origin/pkg/cmd/cli/cmd.(*NewAppOptions).RunNewApp(0xc420c3a538, 0x0, 0x2)
    /go/src/github.com/openshift/origin/pkg/cmd/cli/cmd/newapp.go:300 +0x1433
github.com/openshift/origin/pkg/cmd/cli/cmd.NewCmdNewApplication.func1(0xc4202a6900, 0xc420f94da0, 0x1, 0x1)
    /go/src/github.com/openshift/origin/pkg/cmd/cli/cmd/newapp.go:209 +0x10a
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc4202a6900, 0xc420f94d30, 0x1, 0x1, 0xc4202a6900, 0xc420f94d30)
    /go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:603 +0x439
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc4202b0240, 0xc42002a008, 0xc42002a018, 0xc4202b0240)
    /go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:689 +0x367
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(0xc4202b0240, 0x2, 0xc4202b0240)
    /go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:648 +0x2b
main.main()
    /go/src/github.com/openshift/origin/cmd/oc/oc.go:36 +0x196
murphy:dataverse pdurbin$ 
danmcp commented 6 years ago

It obviously shouldn't give that error but it doesn't like your json. Try this one:

{
   "kind":"Template",
   "apiVersion":"v1",
   "metadata":{
      "name":"dataverse",
      "labels":{
         "name":"dataverse"
      },
      "annotations":{
         "openshift.io/description":"Dataverse is open source research data repository software: https://dataverse.org",
         "openshift.io/display-name":"Dataverse"
      }
   },
   "objects":[
      {
         "kind":"Service",
         "apiVersion":"v1",
         "metadata":{
            "name":"dataverse-glassfish-service"
         },
         "spec":{
            "ports":[
               {
                  "name":"web",
                  "protocol":"TCP",
                  "port":8080,
                  "targetPort":8080
               }
            ]
         }
      },
      {
         "kind":"ImageStream",
         "apiVersion":"v1",
         "metadata":{
            "name":"ndslabs-dataverse"
         },
         "spec":{
            "dockerImageRepository":"ndslabs/dataverse"
         }
      },
      {
         "kind":"DeploymentConfig",
         "apiVersion":"v1",
         "metadata":{
            "name":"dataverse-glassfish",
            "annotations":{
               "template.alpha.openshift.io/wait-for-ready":"true"
            }
         },
         "spec":{
            "template":{
               "metadata":{
                  "labels":{
                     "name":"ndslabs-dataverse"
                  }
               },
               "spec":{
                  "containers":[
                     {
                        "name":"ndslabs-dataverse",
                        "image":"ndslabs-dataverse",
                        "ports":[
                           {
                              "containerPort":8080,
                              "protocol":"TCP"
                           }
                        ],
                        "imagePullPolicy":"IfNotPresent",
                        "securityContext":{
                           "capabilities":{

                           },
                           "privileged":false
                        }
                     }
                  ]
               }
            },
            "strategy":{
               "type":"Rolling",
               "rollingParams":{
                  "updatePeriodSeconds":1,
                  "intervalSeconds":1,
                  "timeoutSeconds":120
               },
               "resources":{

               }
            },
            "triggers":[
               {
                  "type":"ImageChange",
                  "imageChangeParams":{
                     "automatic":true,
                     "containerNames":[
                        "ndslabs-dataverse"
                     ],
                     "from":{
                        "kind":"ImageStreamTag",
                        "name":"ndslabs-dataverse:latest"
                     }
                  }
               },
               {
                  "type":"ConfigChange"
               }
            ],
            "replicas":1,
            "selector":{
               "name":"ndslabs-dataverse"
            }
         }
      }
   ]
}
pdurbin commented 6 years ago

@danmcp thanks! Added in 4702e0a. Under "Applications" there are now entries under "Deployments" and "Pods" which seems like great progress, but I'm getting this in the log:

--> Scaling dataverse-glassfish-1 to 1 --> Waiting up to 2m0s for pods in rc dataverse-glassfish-1 to become ready error: update acceptor rejected dataverse-glassfish-1: pods for rc "dataverse-glassfish-1" took longer than 120 seconds to become ready

Here's a screenshot:

screen shot 2017-09-15 at 7 38 34 pm
pdurbin commented 6 years ago

Scratch that. I tried again and now I'm getting this:

Using Rserve at localhost:6311
Optional service Rserve not running.
Using Postgres at localhost:5432
Required service Postgres not running. Have you started the required services?

This error seems to be coming from https://github.com/nds-org/ndslabs-dataverse/blob/9ddc9efa54185ffd69e25487159a09c4bb2e56bf/dockerfiles/dataverse/entrypoint.sh#L69

https://github.com/nds-org/ndslabs-dataverse/blob/9ddc9efa54185ffd69e25487159a09c4bb2e56bf/dockerfiles/README.md#starting-dataverse-under-docker has some nice information about how you have to start PostgreSQL and Solr before starting Dataverse, which makes sense.

danmcp commented 6 years ago

@pdurbin Similar to this example:

https://github.com/openshift/origin/blob/master/examples/sample-app/application-template-stibuild.json

You're going to want to add postgres and solr to the same template. And have env vars generated to connect them all together.

pdurbin commented 6 years ago

@danmcp thanks, I made some progress, I think, by adding the centos/postgresql-94-centos7 image in f41d753 but you're welcome to let me know if I'm doing something wrong. I assume I'll still need to mess with the postgres user, password and database values, but the port must be open now because in console it got past the postgres check and is now failing on Solr, which I guess I'll work on next:

Using Rserve at localhost:6311
Optional service Rserve not running.
Using Postgres at localhost:5432
Postgres running
Using Solr at localhost:8983
Required service Solr not running. Have you started the required services?
pdurbin commented 6 years ago

As of c20dd39 I've added the ndslabs/dataverse-solr image and now I'm seeing /entrypoint.sh: line 101: cd: //dvinstall: No such file or directory in the console like this:

Using Rserve at localhost:6311
Optional service Rserve not running.
Using Postgres at localhost:5432
Postgres running
Using Solr at localhost:8983
Solr running
/entrypoint.sh: line 101: cd: //dvinstall: No such file or directory

Line 101 is referring to https://github.com/nds-org/ndslabs-dataverse/blob/9ddc9efa54185ffd69e25487159a09c4bb2e56bf/dockerfiles/dataverse/entrypoint.sh#L101 which is trying and failing to cd ~/dvinstall. I assume that the dvinstall directory is supposed to be created by unzipping dvinstall-4.2.3.zip as seen by the following two lines at https://github.com/nds-org/ndslabs-dataverse/blob/9ddc9efa54185ffd69e25487159a09c4bb2e56bf/dockerfiles/dataverse/Dockerfile#L49

    && wget https://github.com/IQSS/dataverse/releases/download/v4.2.3/dvinstall-4.2.3.zip \
    && unzip dvinstall-4.2.3.zip \

I'm not sure how to tell how much of that Dockerfile has been executed. I guess it would be nice to ssh into the host and poke around but I'm not sure how to do that.

pdurbin commented 6 years ago

I chatted with @bodom0015 over at https://gitter.im/nds-org/ndslabs?at=59bff3d9cfeed2eb65247bfb and to summarize the conversation:

@bodom0015 found https://blog.openshift.com/getting-any-docker-image-running-in-your-own-openshift-cluster/ and I ran the following:

oc adm policy add-scc-to-user anyuid -z default --as system:admin

Then I went into the Minishift GUI and clicked "Deploy". After the deployment, I was able to rsh into the pod and confirm that /root/dvinstall is present:

murphy:dataverse pdurbin$ oc rsh dataverse-glassfish-2-x3x40
Defaulting container name to ndslabs-dataverse.
sh-4.2# ls /root/dvinstall
config-dataverse    init-postgres        schema.xml
config-glassfish    install          setup-all.sh
createDDL.sql       jhove.conf           setup-builtin-roles.sh
data            jhoveConfig.xsd      setup-datasetfields.sh
dataverse.war       pgdriver             setup-dvs.sh
glassfish-setup.sh  reference_data.sql       setup-identity-providers.sh
init-dataverse      reference_data_filtered.sql  setup-irods.sh
init-glassfish      restart-glassfish        setup-users.sh
sh-4.2# 

@danmcp what do you suggest I do next? I know you don't want these images running as root but should I try to add a route to see if Dataverse is running at least?

danmcp commented 6 years ago

@pdurbin There isn't any harm in skipping ahead and coming back to the root problem.

pdurbin commented 6 years ago

@danmcp ok. Thanks. My current status is that there seems to be something wrong with the Docker image I'm trying to use. Specifically, when the dataverse.war file is being deployed to Glassfish, it shows remote failure: Error occurred during deployment: Exception while preparing the app : Invalid resource : jdbc/VDCNetDS__pm. Please see server.log for more details. From what I can tell that resource is invalid because the command to create it failed (create-jdbc-resource failed).

I guess I'm trying to figure out where to go from here. So far I've been trying to use the NDS Labs Docker images as-is from DockerHub. I'm wondering if it's time for me to attempt to create my own Docker images so that I can troubleshoot these problems. I'm not sure how to iterate on my own Docker images in a Minishift environment. Judging from https://docs.openshift.org/latest/minishift/openshift/openshift-docker-registry.html there's is a registry inside of Minishift that might like DockerHub but running within Minishift? I'm wondering if I can push Docker images I create into that registry and then reference them from the openshift.json file I've been iterating on. I don't even have Docker installed on my Mac and I gather that I should install it via a .dmg file based on the link I followed in https://docs.openshift.org/latest/minishift/using/docker-daemon.html . Once I have it installed I guess I can try to use eval $(minishift docker-env) and then run docker commands against my Minishift installation such as the ones mentioned on this page about interacting with the Minishift registry: https://docs.openshift.org/latest/minishift/openshift/openshift-docker-registry.html

In short, I'm a bit blocked on not knowing how to iterate on Docker images.

Here's the full output that contains the errors I mentioned above:

murphy:dataverse pdurbin$ oc logs dataverse-glassfish-2-x3x40 -c ndslabs-dataverse 
Using Rserve at localhost:6311
Optional service Rserve not running.
Using Postgres at localhost:5432
Postgres running
Using Solr at localhost:8983
Solr running

Initializing Postgres
Using psql version 9.2
Connected to postgres on localhost:5432
/usr/bin/psql -q -h localhost -p 5432 -c "" -d postgres dvnapp >/dev/null 2>&1

Creatinkkkg Postgres user (role) for the DVN: dvnapp
Creating Postgres database: dvndb
/usr/bin/psql -q -h localhost -p 5432 -c "" -d dvndb dvnapp >/dev/null 2>&1
/usr/bin/psql -q -h localhost -p 5432 -c "select count(*) from dataverse" -d dvndb dvnapp >/dev/null 2>&1
Initializing postgres database
Executing DDL
/usr/bin/psql -q  -h localhost -p 5432 -d dvndb dvnapp -f createDDL.sql
psql:createDDL.sql:17: NOTICE:  identifier "index_foreignmetadatafieldmapping_foreignmetadataformatmapping_id" will be truncated to "index_foreignmetadatafieldmapping_foreignmetadataformatmapping_"
psql:createDDL.sql:179: NOTICE:  identifier "index_datasetfielddefaultvalue_parentdatasetfielddefaultvalue_id" will be truncated to "index_datasetfielddefaultvalue_parentdatasetfielddefaultvalue_i"
psql:createDDL.sql:192: NOTICE:  identifier "index_datasetfield_controlledvocabularyvalue_controlledvocabularyvalues_id" will be truncated to "index_datasetfield_controlledvocabularyvalue_controlledvocabula"
Executed DDL
Loading reference data
/usr/bin/psql -q  -h localhost -p 5432 -d dvndb dvnapp -f reference_data_filtered.sql
Loaded reference data
Granting privileges
Grant succeeded

Installing the Glassfish PostgresQL driver

Initializing Glassfish
/usr/local/glassfish4/bin ~/dvinstall
Waiting for domain1 to start .....
Successfully started the domain : domain1
domain  Location: /usr/local/glassfish4/glassfish/domains/domain1
Log File: /usr/local/glassfish4/glassfish/domains/domain1/logs/server.log
Admin Port: 4848
Command start-domain executed successfully.
remote failure: Invalid property syntax, missing property value: password=
Invalid property syntax, missing property value: password=
Usage: create-jdbc-connection-pool [--datasourceclassname=datasourceclassname] [--restype=restype] [--steadypoolsize=8] [--maxpoolsize=32] [--maxwait=60000] [--poolresize=2] [--idletimeout=300] [--initsql=initsql] [--isolationlevel=isolationlevel] [--isisolationguaranteed=true] [--isconnectvalidatereq=false] [--validationmethod=table] [--validationtable=validationtable] [--failconnection=false] [--allownoncomponentcallers=false] [--nontransactionalconnections=false] [--validateatmostonceperiod=0] [--leaktimeout=0] [--leakreclaim=false] [--creationretryattempts=0] [--creationretryinterval=10] [--sqltracelisteners=sqltracelisteners] [--statementtimeout=-1] [--statementleaktimeout=0] [--statementleakreclaim=false] [--lazyconnectionenlistment=false] [--lazyconnectionassociation=false] [--associatewiththread=false] [--driverclassname=driverclassname] [--matchconnections=false] [--maxconnectionusagecount=0] [--ping=false] [--pooling=true] [--statementcachesize=0] [--validationclassname=validationclassname] [--wrapjdbcobjects=true] [--description=description] [--property=property] jdbc_connection_pool_id 
Command create-jdbc-connection-pool failed.
remote failure: Attribute value (pool-name = dvnDbPool) is not found in list of jdbc connection pools.
Command create-jdbc-resource failed.
configs.config.server-config.ejb-container.ejb-timer-service.timer-datasource=jdbc/VDCNetDS
Command set executed successfully.
Created 1 option(s)
Command create-jvm-options executed successfully.
Created 1 option(s)
Command create-jvm-options executed successfully.
Created 1 option(s)
Command create-jvm-options executed successfully.
Created 1 option(s)
Command create-jvm-options executed successfully.
Created 1 option(s)
Command create-jvm-options executed successfully.
Created 1 option(s)
Command create-jvm-options executed successfully.
Mail Resource mail/notifyMailSession created.
Command create-javamail-resource executed successfully.
Deploying dataverse.war
remote failure: Error occurred during deployment: Exception while preparing the app : Invalid resource : jdbc/VDCNetDS__pm. Please see server.log for more details.
Command deploy failed.
~/dvinstall

Initializing Dataverse
Waiting for Dataverse
murphy:dataverse pdurbin$ 
craig-willis commented 6 years ago

@pdurbin I'll try to find you online tomorrow. As @bodom0015 mentioned, we haven't tried running this under OpenShift and there are certainly going to be assumptions from the NDS Labs Workbench system. As you've noticed, the image does run on root -- but it doesn't need to. I expect an image that works with OpenShift will work under our system as well.

In the above comment, the error I'd look at is:

remote failure: Invalid property syntax, missing property value: password= Invalid property syntax, missing property value: password= Usage: create-jdbc-connection-pool ... remote failure: Attribute value (pool-name = dvnDbPool) is not found in list of jdbc connection pools. Command create-jdbc-resource failed.

This indicates that the JDBC connection pool wasn't created. Looking at the init-glassfish script, it would seem that the environment variable POSTGRES_PASSWORD isn't being set.

I'm wondering if it's time for me to attempt to create my own Docker images so that I can troubleshoot these problems

The NDS Labs images were put together as part of a proof-of-concept. I'd love to see an official Dataverse image (or set of images) and would be happy to contribute as needed to make this happen. Things are likely further complicated by my early effort to split out the Dataverse installation process into what could be baked into the Docker image versus configuration required when someone runs the image.

danmcp commented 6 years ago

@pdurbin Installing docker on your mac would make sense if you want to iterate on images. I would probably just push the images to dockerhub though rather than to the registry on minishift since minishift isn't a permanent hosting env. But from Craig's comment above, it sounds like you just need to add an env var to the template to get past this error.

pdurbin commented 6 years ago

I'm attempting to add environment variables such as POSTGRES_PASSWORD to my openshift.json file to make the init-glassfish script happy because it wants them on line 13 as of this commit: https://github.com/nds-org/ndslabs-dataverse/blob/9ddc9efa54185ffd69e25487159a09c4bb2e56bf/dockerfiles/dataverse/init-glassfish#L13

However, when I try to add these environment variables, I start getting "Required service Postgres not running" again, which I fixed last week. So I feel like I'm going backwards. I went ahead and pushed my attempt to add the environment variables to my fork so people can see what I tried: https://github.com/pdurbin/dataverse/commit/c0ba516d1d7ee19d2f620e7dccf02835d6b86c58

By the way, here is my workflow for iterating on my openshift.json file. I'd be happy to hear if there's a better way:

vim conf/openshift/openshift.json # let's hope this works

oc delete project project1 # get ready to start over

oc projects # keep running this until project1 is gone

oc new-project project1 && oc new-app conf/openshift/openshift.json

Basically, I keep blowing away "project1" to force my openshift.json file to be reprocessed after I hack on it.

pdurbin commented 6 years ago

It works!! As e90f771 I can log in to Dataverse 4.2.3 (I'm still using the NDS Labs images) with the "dataverseAdmin" account. This calls for screenshots (different tabs for OpenShift/Minishift vs. Dataverse running inside it):

screen shot 2017-09-19 at 5 54 26 pm

screen shot 2017-09-19 at 5 54 29 pm

Thank you @craig-willis and @bjonnh for all of your help today at http://irclog.iq.harvard.edu/dataverse/2017-09-19 !

Also, I signed up for Docker Hub and created an organization at IQSS at https://hub.docker.com/u/iqss/ . I suppose the next step is to start creating Docker images rather than using the ones provided by NDS Labs. In IRC we talked about not using the conventional "latest" tag but rather a tag named after the branch we're on, which is "4040-docker-openshift".

craig-willis commented 6 years ago

@pdurbin A few additional thoughts.

At some point you'll need to deal with persistent volumes for any data. We have the following mounts specified in our Kubernetes specs. We use Kubernetes-specific volume and volumeMount specifications, so these will likely be different for OpenShift.

We never found a decent solution to reuse an official Solr image and add the custom index configuration via Docker. In the ndslabs/dataverse-solr image, the Dataverse schema is built into the image.

I have changes that support the upgrade to 4.7, but these have not been merged (https://github.com/nds-org/ndslabs-dataverse/pull/11).

You may recall from https://github.com/nds-org/ndslabs-dataverse/issues/8 that I go through the process of generating the Postgres schema instead of relying on Eclipselink to create the database schema during WAR deployment. If you create you're own Docker image, I expect that you'll run into the same issue.

Let me know if you decide to go ahead with your own image or want me to make changes to the ndslabs images to support OpenShift deployment. In the latter case, feel free to open issues on https://github.com/nds-org/ndslabs-dataverse/ (for example, the root user problem).

pdurbin commented 6 years ago

@craig-willis well, even though we succeeded yesterday in getting Dataverse running on Minishift, the target is actually OpenShift Online, which requires that images not run as root. So, there's still work to do and I'd be very happy to keep collaborating with you on this.

Last night I did go ahead and push my first Docker image to the IQSS organization I created on Docker Hub. I worked on the Solr image first ( https://hub.docker.com/r/iqss/dataverse-solr/ ) because it seemed the most straightforward. If you look at 0a44410 you'll see that I basically copied and pasted your work but I adjusted the Dockerfile to grab the latest Solr schema.xml file from the source tree rather than downloading it from the master branch. I had to mess with the docker build command, giving in our conf directory as the context.

Running docker build is rather slow. It has to download the Solr tarball, for example. And it's also slow to upload to Docker Hub. I wonder if there's a way to iterate on these images faster. If anyone has any ideas, please let me know.

I hear you on the local data. I'm not sure what to do there.

I think next I'll work on the Dataverse/Glassfish image so I can actually make changes to it.

craig-willis commented 6 years ago

@pdurbin

The user change is pretty straightforward. In the Dockerfile, we simply add a RUN instruction with the appropriate useradd command for the base OS then use the USER instruction. For example, the following would add a glassfish group and user then set the current user in the image to glassfish.

RUN  groupadd glassfish && \  
         useradd -s /bin/bash -d /home/glassfish -m -g glassfish glassfish 

USER glassfish

I can make this change today to a branch of the ndslabs Dataverse image, if it makes sense.

Great news on the Docker push. One of the benefits of Docker is the image layering and cache. Once you've pulled or built one of the layers, it should be cached locally.

As for the Solr build slowness, I tried to be faithful to the Dataverse install instructions in building these images. We might consider using the official Solr Docker image (https://hub.docker.com/_/solr/) as a base going forward instead of pulling the tar into a CentOS base.

pdurbin commented 6 years ago

@craig-willis sure! If you could get your images working on OpenShift Online by making sure they don't run as root, you'd really help me out. I'm still hacking away on my own Dockerfile for Dataverse/Glassfish. I'm taking out iRods, by the way. And I'm planning on just running the normal Dataverse installer and letting the deployment of the war file create the database tables, like we usually do.

The thing I'm confused about is if I have to push my giant image to DockerHub before I can try it in Minishift. I'm not on the speediest network connection at the moment. It would be nice if I could push the image into Minishift somehow.

craig-willis commented 6 years ago

@pdurbin No problem on iRODS -- that was put in for the Odum/DFC proof-of-concept. We can always extend your image and add it if we need it in the future.

I ran into problems with Eclipselink, particularly if the Glassfish container restarted. Maybe that's been resolved more recently, but from my experience when Glassfish restarts and redeploys the WAR file, Eclipselink tries to recreate the schema and fails. There are probably other ways to work around this, but I couldn't find an option in the EclipseLink config.

I don't know of a way to "push" to MInishift. For the Docker build, you could try ssh'ing into your Minishift VM. I'm on Mac with Virtualbox and minishift ssh takes me into the VM. You can build the image in place and (hopefully) Minishift will use it without pulling.

pdurbin commented 6 years ago

I don't know of a way to "push" to MInishift.

I'm trying to follow https://docs.openshift.org/latest/minishift/openshift/openshift-docker-registry.html to push the Docker image I'm working on ("iqss/dataverse-glassfish:4040-docker-openshift") to Minishift rather than Docker Hub but I'm getting unauthorized: authentication required

murphy:dataverse pdurbin$ docker login -u developer -p $(oc whoami -t) $(minishift openshift registry)
Login Succeeded
murphy:dataverse pdurbin$ 
murphy:dataverse pdurbin$ docker push $(minishift openshift registry)/iqss/dataverse-glassfish:4040-docker-openshift
The push refers to a repository [172.30.1.1:5000/iqss/dataverse-glassfish]
3154d0c6075a: Preparing 
ecfe0944757b: Preparing 
49a3d17cf298: Preparing 
af3fec245b7e: Preparing 
2c21a33d943c: Preparing 
edb0a1950125: Waiting 
fc97fea51367: Waiting 
29e4709e34b0: Waiting 
8d1ee7b04724: Waiting 
d78c341fda71: Waiting 
12bbe51d3106: Waiting 
9c2f1836d493: Waiting 
unauthorized: authentication required
murphy:dataverse pdurbin$ 

From what I can tell, other people are having similar trouble with Minishift:

Any thoughts on this @danmcp ? I do plan to push my "iqss/dataverse-glassfish" image to Docker Hub eventually, like I did for "iqss/dataverse-solr" but while I'm hacking away on it, it would be nice to avoid having to upload it to Docker Hub. I was hoping to push temporarily to the Minishift registry.

pdurbin commented 6 years ago

I'm struggling with trying to build an IQSS image for Dataverse/Glassfish. I just pushed some scratch work to https://github.com/pdurbin/dataverse/commit/f872df495019d362b40d46f8bf3c87735e02b97a but would welcome some more eyes on it.

When I run oc status I'm getting The image trigger for dc/dataverse-glassfish will have no effect because is/dataverse-glassfish does not exist.

pdurbin commented 6 years ago

@danmcp I took a break from my problems building my own Dataverse/Glassfish image (described above) and thought I'd see what sort of errors I get if I upload my openshift.json file that's at least letting me run an old version of Dataverse (4.2.3, still based on the NDS Labs image) on Minishift if I allowd containers to run as root. Basically, I thought I'd see what error OpenShift Online shows me for a container that tries to run as root.

Specifically, I uploaded this version of the openshift.json: https://github.com/IQSS/dataverse/blob/0a444105a2fdb5924e9598f0d2bf5f98b4dff700/conf/openshift/openshift.json

To my surprise, rather than seeing an error about a container running as root, I saw an error about memory:

Error creating: pods "dataverse-glassfish-1-" is forbidden: [maximum cpu usage per Pod is 2, but limit is 3., maximum memory usage per Pod is 1Gi, but limit is 1610612736.]

Here's a screenshot:

screen shot 2017-09-20 at 2 43 32 pm

I looked at the openshift.json file and I don't see anything about memory. I guess I'm not sure what's telling OpenShift Online how much memory to use and I'm wondering if Dataverse will be able to squeeze into the 1 GB limit for the free tier.

Anyway, back to my hacking!

danmcp commented 6 years ago

@pdurbin Will dig into why you got that error, but you're going to need to set the limits for the online version of the template.

abhgupta commented 6 years ago

The platform (specifically, the Starter clusters) defaults to 512Mi RAM and 1 core of CPU for containers that do not specify the resource limits. In case of your deployment (dataverse-glassfish), you have three containers, each defaulting to 512Mi RAM. This comes to a total of 1.5Gi RAM total for the pod (and 3 cores cpu). You will need to explicitly specify container resources in your deployment config.

danmcp commented 6 years ago

Thanks! @abhgupta

@pdurbin The error was strange to me because it was saying this was the case for one pod. I was expecting you to have 3 pods each with 1 container. Any reason you set it up this way? Is it temporary?

pdurbin commented 6 years ago

@abhgupta ah, thanks. When the time comes I'll try to figure out how to explicitly set container resources in my deployment config.

@danmcp I'm happy to defer to you for all things OpenShift. I'm just trying to get anything to work. As I mentioned at https://github.com/IQSS/dataverse/issues/4040#issuecomment-330686158 I have the old-ish (Dataverse 4.2.3) NDS Labs Dataverse/Glassfish image working if I run it as root on Minishift. It looks like it has one pod and three containers. Here's a screenshot:

screen shot 2017-09-20 at 7 53 00 pm

The config above was created with https://github.com/IQSS/dataverse/blob/0a444105a2fdb5924e9598f0d2bf5f98b4dff700/conf/openshift/openshift.json , the same one I tried uploading to OpenShift Online earlier. I agree that the error is confusing.

I've been trying to switch to a Dataverse/Glassfish Docker container of my own invention. I'm doing this because the NDS Labs image is for Dataverse 4.2.3, which is somewhat old and I'd like to be able to create a working Dataverse/Glassfish image for any arbitrary branch that's in flight. I have a dream of deploying the branches behind pull requests somewhere so I can run my API test suite against them before they get merged. Also, as we know, the NDS Labs images currently run as root so I'm hoping to address that once I can get my own Dataverse/Glassfish images to work any way I can. I hope this makes sense. I've been hacking on my Dockerfile and "entrypoint" script in my fork (latest commit at https://github.com/pdurbin/dataverse/commit/ff219644396617df3512eccf4269be5d61627174 ) and once I get a little more traction, I'm planning to copy over the working config to https://github.com/IQSS/dataverse/tree/4040-docker-openshift in this repo, switching the config from the NDS Labs Dataverse/Glassfish image to my (IQSS) image. Again, I hope this makes sense. It sounds like @craig-willis might independently work on trying to get his NDS Labs images to run as non-root, which will give me a leg up when I get to that stage.

I feel like I should probably make a chart to show where I am and where I'm trying to go but I hope the words above help.

I still find it odd that I have to push to DockerHub each time I iterate a tiny bit on my "entrypoint" script but in practice it only takes a few minutes to push and have the image be ready for download from Docker Hub. Then I switch over to Minishift and pull it down. It feels like there should be a more integrated experience using the Minishift registry, but as I reported above, I failed to get it working. Oh well.

pdurbin commented 6 years ago

@craig-willis (or anyone) do you know why I'm getting Remote server does not listen for requests on [localhost:4848]. Is the server up? when running asadmin commands? I added the output of the "endpoint" script I hacking on as a comment on this commit: https://github.com/pdurbin/dataverse/commit/ff219644396617df3512eccf4269be5d61627174 (scroll down).

danmcp commented 6 years ago

@pdurbin Using one pod or three pods doesn't have anything to do with the image you are using. It's just how the json is structured. The reason for using three pods in the json instead of 1 is because glassfish, postgresql, and solr are not designed to scale together.

pdurbin commented 6 years ago

@danmcp ok, I'm definitely open to structuring the JSON a different way.

@craig-willis I just noticed that I copied rm -rf /usr/local/glassfish4/glassfish/domains/domain1 from https://github.com/nds-org/ndslabs-dataverse/blob/4.2.3/dockerfiles/dataverse/Dockerfile#L33 and I'm going to try to take it out. I'm guessing it's the reason I'm seeing "Corrupt domain. The config directory does not exist. I was looking for it here: /usr/local/glassfish4/glassfish/domains/domain1/config" in the log I posted to https://github.com/pdurbin/dataverse/commit/ff219644396617df3512eccf4269be5d61627174 . I'm rebuilding my Docker image and waiting, waiting as it slowly uploads to Docker Hub. 😄

pdurbin commented 6 years ago

Yeah, that was it. I'm not sure why domain1 was being blown away but I commented out that line at https://github.com/pdurbin/dataverse/commit/7b4390ec3695559c6ed3b7025d8d5ac04c5ecccc and will clean up what I have and make a commit on the main 4040-docker-openshift branch in this repo when I'm done. It's nice to see the latest code from Dataverse running in Docker!

@craig-willis any news on stuff you posted earlier about making this container that doesn't need to run as root? I'm copying and pasting below the commands you wrote in an earlier comment so I have them handy:

RUN  groupadd glassfish && \  
         useradd -s /bin/bash -d /home/glassfish -m -g glassfish glassfish 

USER glassfish

Now that I have my own Dataverse/Glassifish Docker image at least somewhat working, it'll be easier for me to iterate on it, especially once I'm back at my desk and on Internet2 for faster Docker Hub uploading and downloading. 😄

pdurbin commented 6 years ago

Ok, in 7c81b4e I switched over from the NDS Labs Dataverse/Glassfish image to the one I built. It's sort of a mess but I need to go help fix up an open pull request so I'm switching off of this issue and the 4040-docker-openshift branch for a bit. If anyone wants to pick up any tasks for this issue, here's what's on my mind:

danmcp commented 6 years ago

@pdurbin Here is an example:

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_examples/files/examples/v3.7/quickstart-templates/cakephp-mysql.json#L291

This one uses a templated value from here:

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_examples/files/examples/v3.7/quickstart-templates/cakephp-mysql.json#L453

pdurbin commented 6 years ago

@danmcp thanks. Good news bad news on setting a memory limit. Mostly bad news. I committed a change to my fork at https://github.com/pdurbin/dataverse/commit/43ba1fcf4b6413e31dde9119b483520c1a8b898a and tried it out on Minishift and OpenShift Oneline. The results:

screen shot 2017-09-21 at 2 33 51 pm
danmcp commented 6 years ago

@pdurbin Too bad. But I would suggest the backup deliverable of deploying to minishift is reasonable if the memory is a permanent limit.

craig-willis commented 6 years ago

@pdurbin I'm building a test image with the non-root user now. I'll let you know how it goes.

Regarding resource limits, Labs Workbench has the same requirement. We originally set the limit to 1GB for the Dataverse/Glassfish application.

I can recall why I did the rm on the domain1 directory after extracting Glassfish.

The mkdircommand is me sloppily trying to copy my persistence.xml (that turns off Eclipselink schema generation) into the war before deploy. If you're not going to use that approach, you can just remove everything related to persistence.xml. There may be something with OpenShift not wanting you to copy into /tmp (but I don't know why).

pdurbin commented 6 years ago

@danmcp thankfully, @craig-willis seems to think Dataverse might fit in 1 GB. Phew. We'll get a lot more value from this effort if people wanting to kick the tires on Dataverse can simply upload a JSON file to OpenShift Online rather than doing all the setup for Minishift. Vagrant is easier.

@craig-willis I already disabled all the persistence.xml stuff so the mkdir error must be coming from something else. Do you remember what VOLUME /usr/local/glassfish4/glassfish/domains/domain1/file is for? I hadn't thought that copying to /tmp might be a problem with OpenShift Oneline but I really don't know. Once you get your non-root user version working, are you planning on trying it on OpenShift Online? If not, can you at least let me know the tag of the image on DockerHub so I can try it myself?

danmcp commented 6 years ago

@pdurbin Have you don't the work yet to tune the glassfish heap based on the memory limit?

pdurbin commented 6 years ago

@danmcp I can't remember if @landreev mentioned it when we met but long ago he added something to the Dataverse install script to adjust the Glassfish heap based on the amount of memory available. Here's how it works: https://github.com/IQSS/dataverse/blob/v4.7.1/scripts/installer/install#L907

My current frustration is that I'm trying to iterate on conf/docker/dataverse-glassfish/entrypoint.sh to get my Dataverse/Glassfish container to run as non-root but when I push updated images to Docker Hub the changes are not reflected in Minishift. I can even see the new sha256 values of my updated images in Minishift and they match the output of docker push so I don't know what's going on. It was working earlier today. Very frustrating.

danmcp commented 6 years ago

@pdurbin I don't believe that's going to work in the container. You're going to need to use:

CONTAINER_MEMORY_IN_BYTES=cat /sys/fs/cgroup/memory/memory.limit_in_bytes

or the downward api.

craig-willis commented 6 years ago

@pdurbin I've pushed a test image to my Dockerhub under craigwillis/dataverse:4.7 based on a branch in my fork (this also upgrades to 4.7) https://github.com/craig-willis/ndslabs-dataverse/tree/upgrade-4.7/dockerfiles/dataverse. Changes are in https://github.com/craig-willis/ndslabs-dataverse/commit/e9382a78380806d181448f53c54a6bb7b36c9c94.

In short, I add a glassfish group and user, chown the glassfish4 directory, and added the USER instruction to set the effective user in running instances. I've tested this only in Labs Workbench.

I believe we've commented out the dynamic heap setting and just run with defaults. I'm looking at a running Dataverse container on the Labs Workbench system and it's currently at 2.6GB (this doesn't include Postgres or Solr). So my earlier comment about 1G is likely wrong.

And just to plug the Labs Workbench system... Labs Workbench is a product of the National Data Service (NDS) consortium and is really intended for the NDS and RDA community to "kick the tires" on a variety of research data management related services. It's not meant for production hosting, but it was intended to let people spin up instances of services like Dataverse quickly for evaluation. Of course, I can certainly see the value in supporting OpenShift as well.

pdurbin commented 6 years ago

@craig-willis I just look a quick peek at https://github.com/craig-willis/ndslabs-dataverse/commit/e9382a78380806d181448f53c54a6bb7b36c9c94 and I'll try something similar with running Glassfish as non-root in my image (once I fix my Minishift environment so that changes I push to Docker Hub are pulled down again). Did you happen to test if you're able to run the container as non-root and have it still work? I don't know if NDS Labs Workbench or Kubernetes has the concept of running images a non-root or not. It sounds like OpenShift is the thing that's making us reevaluate if images are run as root or not.

Also, 2.6 GB is too fat to run on the free tier of OpenShift Online so if you (or anyone reading this) are able to get Dataverse to squeeze into some skinny jeans, please let me know! Please note the comment by @danmcp about checking /sys/fs/cgroup/memory/memory.limit_in_bytes.

Honestly, we should plug Labs Workbench somewhere in our documentation. I showed the home page to @CCMumma during the community meeting and she seemed quite interested in it. Maybe I could even roll it into this branch I'm working on since I'm touching the part where I talk about kicking the tires on Dataverse. I'll talk to @dlmurphy about it.