CERNStudyGroup / cernstudygroup.github.io

https://cernstudygroup.github.io
Other
20 stars 9 forks source link

Turn your computer into lxplus with a single command #20

Open ibab opened 8 years ago

ibab commented 8 years ago

Date: TBD Time: TBD Location: TBD Vidyo for remote participants (just click on the link to proceed): CERN_OpenScience Session format: Work-Along


Lesson guide: @ibab, @lukasheinrich Lesson materials: Coming… Slides: Coming…


Join the chat at https://gitter.im/CERNStudyGroup/cernstudygroup.github.io If you haven't done so yet, introduce yourself (#26) and list your project(s) (#24).


Original post:

Using docker, we can easily get access to tools available on cvmfs by running:

docker run -t -i --rm --privileged hepsw/cvmfs-lhcb bash

(Example for lhcb) This allows you to easily use and work on your collaboration's software on your laptop, workstation or server.

This could also allow you to make workflows that depend on custom physics software reproducible. (Just include a Dockerfile or script that calls docker in your repo)

Would someone be interested in learning more about this in one of the study groups?

Edit: Changed sbinethepsw

lukasheinrich commented 8 years ago

Hi,

I've been very interested in getting a lxplus like environment on top of cern/slc6-base. My start was this:

https://github.com/lukasheinrich/hepsw-docker/tree/master/lxpluslike

image at: https://hub.docker.com/r/lukasheinrich/lxplus-like/

basically I tried to snapshot what yum has installed on lxplus and tried to install the same. The same steps could be either done on top of hepsw/cvmfs-lhcb (I think @sbinet retrired the binet/* images) or on top of plain scl6-base with the option to mount cvmfs at runtime via

docker run -v /cvmfs:/cvmfs ...

So I would love to see this explored more. Maybe some people have good contacts to LXPLUS people? (@pherterich ? )

ibab commented 8 years ago

Yeah, maybe we can come up with some default images people might want to use and write some instructions on how they can be used to do interesting things.

docker run -v /cvmfs:/cvmfs

Wouldn't this require you to set up cvmfs on the host?

lukasheinrich commented 8 years ago

yes this will require a cvmfs setup on the host. But the advantage is that you don't need to run the container in privileged mode. Also, the docker image itself doesn't need to know anything about cvmfs which keeps the images more lightweight.

lukasheinrich commented 8 years ago

btw this we use exactly this /cvmfs mounting + custom lightweight docker images for RECAST. On top of the /cvmfs mount people can also request to have a /afs mounted (I'm working on getting /eos to work). Finally some workflow steps need GRID authentication, so for those we mount a path /recast_auth which includes a script /recast_auth/getmyproxy.sh this will then use the host's host certificate to get a MyProxy which has previously been placed on CERNs MyProxy server.

adavidzh commented 8 years ago

I have been fiddling with docker-machine and docker exactly for this purpose: "lxlaptop". Up to now I was getting stuck in finding a reasonable Dockerfile or image to start from.

lukasheinrich commented 8 years ago

I think it's a good idea to start from cern/slc6-base and install libraries on top to make it more similar to lxplus like in the example I posted.

ibab commented 8 years ago

:+1: on starting with cern/slc6-base.

@lukasheinrich: Do you mean this RECAST? http://recast.perimeterinstitute.ca/ Looks interesting!

adavidzh commented 8 years ago

I just found https://github.com/hepsw/docks Thanks for getting me on the right track :)

lukasheinrich commented 8 years ago

@ibab yes that's the right website but it's the old version and we (@cranmer, me, and a couple of other people) have been making good progress on docker-based workflows for analysis (i.e. you can have separate docker images for each workflow step). If you have LHCb Analysis Code, it would be great to try this out!

@adavidzh I also have a similar repository as @sbinet which I would like to merge into the hepsw organization that @sbinet setup, but I haven't gotten to it yet. Check this out:

https://github.com/lukasheinrich/hepsw-docker

ibab commented 8 years ago

@lukasheinrich: Sounds good! Some people at LHCb are experimenting with automated analysis workflows (like producing all plots starting from some initial dataset). We could try to plug our code into your system.

lukasheinrich commented 8 years ago

that would be great. Can you put me in touch with them?

ibab commented 8 years ago

Do you have access to the lhcb-collaborative-working@cern.ch egroup? That's the mailing list we use for that (and other stuff).

lukasheinrich commented 8 years ago

no it seems closed to LHCb members only. By the way, I have been in touch with @anaderi on including LHCb into RECAST. Maybe we can work form there? I would also be happy to present a short overview of what we already have in our infrastructure to the group (in case you have regular meetings)

seneubert commented 8 years ago

@lukasheinrich maybe you can discuss on one of the next collaborative-working meetings?

lukasheinrich commented 8 years ago

that would be great. I'll include you in an email thread that we have had for a while, maybe we can discuss this further there.

adavidzh commented 8 years ago

Looks to me that LHCb is having a lot of good fun. When I mention Docker, all I get is shrugs around here.

lukasheinrich commented 8 years ago

hey @adavidzh, somewhat similar to ATLAS, but we're getting there. We are in touch with a couple of CMS people as well (Ken Bloom, Mike Hildreth..) so maybe we can get something going. It seems like I will be giving a talk next Monday for the LHCb group, maybe you can get one-time access as well?

PS: i'll include you in the same email thread as the others

betatim commented 8 years ago

Getting back to docker for analysis: we should have a lesson on this! Showing people how easy it is to get going and have a lxlaptop. One thing I am undecided on is whether you should make use of cvmfs or not. I think it depends a lot on what you want to use the container for. Quick play, day to day use it is probably fine but for reproducible analysis relying on cvmfs is not that useful as it seems cvmfs is not immutable (which is what you need). Is there a tool to create a container that takes stuff from cvmfs during build time, and then somehow freezes it?

adavidzh commented 8 years ago

Well, in CMS whatever is in cvmfs can also be installed in standalone (I got a working Dockerfile to do that last night, but it's still coming at 16 GB).

sbinet commented 8 years ago

for CVMFS-free images, I have provided these 2 as a proof of concept for LHCb:

so, the basic one with just Gaudi installed (from RPMs) and a more involved one, DaVinci which is the analysis flavoured framework of LHCb.

this is detailed in a CHEP paper: https://inspirehep.net/record/1413180?ln=en

just to give an idea on the sizes of these images:

hepsw/lhcb-gaudi   v26r1   3.911 GB
hepsw/lhcb-davinci v36r5   7.790 GB

and for the CVMFS-based ones:

hepsw/cvmfs-base 20150331 629.4 MB
hepsw/cvmfs-lhcb 20150331 629.4 MB

(hepsw/docks has also CVMFS base images for Alice, Atlas and CMS)

AFAIK, there is no tool to extract the meaningful set of files under a CVMFS mount point for a given workload as files are read/downloaded on a JIT basis... I mean, no tool, except for a bare find /cvmfs/experiment -print, which would work IFF the software is installed such as /cvmfs/experiment/app/version with everything needed under that directory (otherwise, you'd just curl the whole CVMFS content...)

lukasheinrich commented 8 years ago

So my experience in ATLAS is that we can get away with a reasonable image sizes at least for our analysis software by installing everything within Docker without /cvmfs access. This is easier built purely on top of ROOT or using a bare bones Gaudi/Athena based release. For running actual reconstruction type jobs, I only tried by mounting /cvmfs at runtime. That works well, but @betatim is right that it creates mutability. One reference I found for cvmfs versioning is here:

https://indico.cern.ch/event/444264/session/0/contribution/0/4/attachments/1211574/1787889/JBGG-UseOfCernVM.pdf

see slide 7. It seems like CVMFS might have something like a commit number, but I'm not sure how that works.

On a Mac, mounting /cvmfs is a bit trickier. The docker-machine runs with user-permissions, while the /cvmfs mountpoint is root-owned, which does not work well together. To get around, I create a user-owned mount ponit in my home directory, manually mount cvmfs on that and then bind docker run -v $HOME/cvmfs:/cvmfs which then works well.

Cheers, Lukas

adavidzh commented 8 years ago

For CMS, by chaining the RPM installation in a single RUN command, I am now at 12.6 GB. This is starting FROM kreczko/puppet-builder because I actually used kreczko/cmssw-standalone by @kreczko as the skeleton.

Is there a better (leaner, closer to lxplus, etc) base image?

lukasheinrich commented 8 years ago

@adavidzh feel free to try lukasheinrich/lxplus-like

this is on top of slc6-base and has most yum packages instaleld that LXPLUS has.

Though this is certainly not lean :-/

kreczko commented 8 years ago

@adavidzh indeed, the containers I was playing around with not optimal as their size can get up to 19 GB. However, if you need CMSSW on a computer that does not have network all the time, kreczko/cmssw-standalone can be useful.

That said, hepsw/cvmfs-cms is much better for most cases.

adavidzh commented 8 years ago

@lukasheinrich: starting from 11 GB is not promising ;) (the CMS software comes in at ~6 GB). @kreczko: kreczko/cmssw-standalone has "only" the problem of not building in Docker Hub (I suppose because of size).

sbinet commented 8 years ago

isn't http://docker.cern.ch/howtopr supposed to tackle this size-on-the-hub issue?

kreczko commented 8 years ago

I did not know CERN had their own registry. That it good news.

lukasheinrich commented 8 years ago

I guess there is a somewhat irreduciple compromise that we have to make. if you want to have something LXPLUS like, meaining to be able to get stuff from cmvfs and expect it to work, you somehow depend on having system libraries installed that these cvmfs software depends on (think: getting a ROOT release, which depends on libX11 etc being installed, etc). Of course it would be a bit nicer if we could get meaningful images that only have a subset of those installed to keep the image size down. But then this will only work with a subset of cvmfs software that this image was tailored to.

lukasheinrich commented 8 years ago

@kreczko yes, they are working on this. Also they are preparing a Google Container Enging-like service that you will able to control with Docker Swarm and Kubernetes. But this is still in a early development phase.

adavidzh commented 8 years ago

@sbinet docker.cern.ch is really cool. Thanks! @lukasheinrich the problem is indeed that in CMS, every single package is taken from the CMS repo, not from SLC. So in a sense, the leaner the starting image, the "better".

lukasheinrich commented 8 years ago

@adavidzh interesting, so CMS maintains its own list of low-level system libraries it needs (stuff I had to deal with e.g. include various openssl libs etc)?

lukasheinrich commented 8 years ago

Hi everyone,

I just gave a short presentation to the ATLAS analysis software group, and got some useful input / suggestions.

One thing that has been traditionally hard to do from e.g. a Mac or Windows was to get access to the Grid. One solution is of course to have cvmfs installed, bind this to docker (analogously to the discussion above), and setup the Grid middleware from there.

But to have a nicer encapsulation, I managed to get a minimal image on top of cern/slc6-base that has everything shipped natively.

The example for now is ATLAS (panda is the ATLAS job submission interface), but the first couple of steps in the Dockerfile should be generalizable to other experiments

https://github.com/lukasheinrich/asg-docker/blob/master/preasg-base/Dockerfile

with that I can nicely submit Grid jobs from a Mac (just bind-mount a directory that has your certificate and key in it)

asciicast

Maybe people from the other expts could try getting their Grid layers to work?

Cheers, Lukas

RaoOfPhysics commented 8 years ago

Sooooo, what's the situation with this lesson?

ibab commented 8 years ago

Do you already have someone for May 14th? I'll be at CERN during that time. Maybe @lukasheinrich or someone else wants to cooperate on the lesson?

RaoOfPhysics commented 8 years ago

Nothing scheduled. Also, I think the Friday is 13 May. :)

ibab commented 8 years ago

Yes, Friday the 13th :fearful:

RaoOfPhysics commented 8 years ago

Sooooo, shall we schedule this? :D

ibab commented 8 years ago

Let's wait a bit to see if someone wants to help. I also need to come up with some good ideas on what to show during the session.

RaoOfPhysics commented 8 years ago

Ok. :)

lukasheinrich commented 8 years ago

hi @ibab,

yes I would definitely be interested in helping out. should we try to skype or something in the next week or so?

Cheers, Lukas

ibab commented 8 years ago

Yes, next week is good!

RaoOfPhysics commented 8 years ago

Hey @ibab, @lukasheinrich: Just confirming that we're on for this Friday. Please let me know! Thanks. :)