JeffersonLab / NPSlib

hcana classes for the Hall C NPS experiments
0 stars 24 forks source link

Containerize NPSlib #21

Closed panta-123 closed 12 months ago

panta-123 commented 1 year ago

We are looking into contanerization of NPS software. My idea will be:

  1. Each Hall C experiment should have its own specific containers. This will helps us make things experiment specific. And each experiment analyst will use its own container.

  2. We will make dockerfile(which builds the container image) for this NPSlib. This means we need to install hcana and NPSlib via this dockerfile. Q. How do we know which release of hcana to use ? Is there specification of hcana release in NPSlib ? Q. This NPSlib doesn't seem to be made into release/tags ? We may want to do that so that we capture what is made into container.

  3. Do we want nps_replay into this container, as well ? If yes we need to stream line nps_replay release management too.

These are just my initial thoughts, please comment.

hansenjo commented 1 year ago

Comments:

  1. That's certainly easier in case people want to make changes.
  2. I agree we need versions and tags for hcana and NPSlib. It's trivial to do. The question is, who will be responsible.
  3. It might be best to wait until nps_replay has stabilized. Replay setups tend to be in great flux during the early weeks of data taking and analysis. But the NPS experiment should decide what they want, of course.
panta-123 commented 1 year ago

While doing initial test I see following. Lets say we make a release 0.1.0 of NPSlib. This will create a tarball of that release. The way cantainer build i am thinking of is:

  1. First we need get a release tarball for hcana. -> But the release tarball of hcana doesn't include submodule. -> This was possible for hcana only as we checkout the code in github workflow and can do submodule update to get podd. -> Q. How should we proceed ?

  2. Install HCANA , set environment.

  3. Get the tarball for a release of the NPSlib.

  4. Install NPSlib

we have to get around Issue in 1 releated to podd. How ? options : a. Add a file that specifies which hcana and Podd release to use. b. have a submodule in NPSlib for Podd. (but have to maintained in NPSlib and also mantained in hcana) b. make the tarball of hcana with sumodule . (But it is not possible in git outof box)

For me a will be great to make automatic container in github workflow.

hansenjo commented 1 year ago

I think a file specifying the versions of hcana and Podd to use with a certain NPSlib release will do the trick.

MarkKJones commented 1 year ago

We can talk at the meeting but having one of these containers for NPSlib, hcana, Podd or a replay direcotory without all the others is meaningless. For an experiment like NPS you need NPSLib, hcana,podd and nps_replay in one container if I get the jist of how this works.

hansenjo commented 1 year ago

If I understood correctly, the plan is very much to put all four components, nps_replay, NPSlib, hcana, and Podd in one single container. This would be the "NPS analysis container". The question is how to automatically determine which version of which component should be included in it.

panta-123 commented 1 year ago

Yes thats the plan. Note: nps_replay can be ommited for now. nps_replay local clone can point to the new container while running. (It depends upon decision we take) May be make requirement.txt file with following content.

hcana-release
Podd-relesse

Or may be json file with repo name and version to use. I have to check how to pass the json or txt file to github action.

MarkKJones commented 1 year ago

How does this work or coordinate with the modules that are on the CUE machine?


From: Anil Panta @.> Sent: Wednesday, November 8, 2023 10:42 AM To: JeffersonLab/NPSlib @.> Cc: Mark Jones @.>; Comment @.> Subject: [EXTERNAL] Re: [JeffersonLab/NPSlib] Containerize NPSlib (Issue #21)

Yes thats the plan. Note: nps_replay can be ommited for now. nps_replay local clone can point to the new container while running. (It depends upon decision we take) May be make requirement.txt file with following content.

hcana-release Podd-relesse

Or may be json file with repo name and version to use. I have to check how to pass the json or txt file to github action.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_NPSlib_issues_21-23issuecomment-2D1802152547&d=DwMCaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=L94Vmo-lm_WoDmweLXqq1Q&m=zXNltqmRVHBwQFATbyR09A0NDaEr-1-oq1n1t8Bu0EpzyltcU4WyzjvrfsyQldEV&s=zCTV-eBnWysfUqubrWf7aZO9Q3qz1IvT8HLTiAW4xjw&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAOBJDCSHZVARM3BUO5NOJLYDOR63AVCNFSM6AAAAAA7BKPADGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBSGE2TENJUG4&d=DwMCaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=L94Vmo-lm_WoDmweLXqq1Q&m=zXNltqmRVHBwQFATbyR09A0NDaEr-1-oq1n1t8Bu0EpzyltcU4WyzjvrfsyQldEV&s=0EdSN6hSc0sYgicBusktCSfDkQM91Irtl6dM0SPcnCU&e=. You are receiving this because you commented.Message ID: @.***>

hansenjo commented 1 year ago

The modules in /group/halla/modulefiles? Each modulefile specifies its version dependencies. For example, module load hcana/0.99 loads analyzer/1.7.6, which in turn loads root/6.26.10 and evio/5.3, which in turn loads gcc/12.3.0 and python/3.11.4 which in turn loads group.apps needed for basics like openssl and binutils.

One could consider building a larger container that contains a module system, where users could pick the particular setup they need for an experiment with a simple module load command. Instead of loading from /group/halla (or /adaqfs/apps or /cdaqfs1/apps), the software would be loaded from a location inside the container. This might make the exact setup more transparent.

Like Anil said, one might want to have a container without the replay directory. Users could just pull in (and modify) their replay from their favorite location (their own GitHub repository, for instance). I am sure students will modify the heck out of existing replay scripts and will want to be able to do this. One could still provide an officially sanctioned default replay setup that one could also activate via a module (see the setup I made on cdaq: module show nps_analysis.

panta-123 commented 1 year ago

First NPSlib docker build sucess. It containes HCANA(hcana-0.99)+PODD(Release-177)+NPSlib(main branch): Below shows loading NPS class form docker image:

 $ docker run --name npslib -it -d docker.io/apanta123/npslib:alpha
b18ee9a1c5e8d3eefc886b1c208fa3bd6b5ae825ea228dd0f65195aaafe1a4a4

$ docker exec -it npslib bash
[root@b18ee9a1c5e8 NPSlib]# hcana
DB_DIR set to DBASE
  ************************************************
  *                                              *
  *            W E L C O M E  to  the            *
  *          H A L L C ++  A N A L Y Z E R       *
  *                                              *
  *  hcana release         0.99      08 Nov 2023 *
  *  PODD release         1.7.7      08 Nov 2023 *
  *  ROOT               6.24/08      Sep 29 2022 *
  *                                              *
  *            For information visit             *
  *      http://hallcweb.jlab.org/hcana/docs/    *
  *                                              *
  ************************************************
hcana [0] gSystem->Load("BUILD/src/libNPS")
(int) 0
hcana [1] THaApparatus* NPS = new THcNPSApparatus("NPS","NPS");
hcana [2] 

We need test this, may be with nps_replay.

hansenjo commented 1 year ago

Very nice.

I guess we should consider "install"ing the software in some convenient container-local location instead of running from the build directory. Of course, this works fine for starters.

What's the output of hcana --version?

panta-123 commented 1 year ago
# hcana --version
DB_DIR set to DBASE
hcana 0.99 08 Nov 2023
Podd 1.7.7 08 Nov 2023
Built for CentOS-7 using gcc-4.8.5, ROOT 6.24/08
hansenjo commented 1 year ago

I see. This is built from tarballs, right? Would it be difficult to build from a git clone?

panta-123 commented 1 year ago

Yes it is build from tarballs (of both hcana and podd).

It will be little complicated to do it using git clone in github action for automatic build. As then we need to checkout proper release branch and also there is github authentication to take care for hcana and podd repo. Thus using tarball makes things easy and also good for long term maintainence. ( ofcourse other might have different opinion)

update : currently i am writing code for github action

hansenjo commented 1 year ago

Authentication? Aren't Podd, hcana, and NPSlib world-readable public repos? Anyone should be able to clone them without GitHub credentials.

panta-123 commented 1 year ago

Good point. (I was not thinking that way somehow) Doing the git clone of specific tag that we want of hcana then we can do submodule init and update to get the podd.

Addded following to Dockerfile.

RUN git clone https://github.com/JeffersonLab/hcana.git --branch ${HCANA_VERSION_TAG}
WORKDIR "/hcana-${HCANA_VERSION_TAG}"
RUN git submodule init && git submodule update

The NPSlib build sucess, I tried hcana 0.98 and 0.99 both.

# hcana --version
DB_DIR set to DBASE
hcana 0.98 git@hcana-0.98-0-gec53c09 31 Aug 2023
Podd 1.7.5 git@Release-175-0-g050ea94 30 Aug 2023
Built for CentOS-7 using gcc-4.8.5, ROOT 6.24/08

Only thing remaining is specifying which hcana tag (As podd is installed from hcana submodule) from NPSlib repo. Will give detail report tomorrow in the meeting.

panta-123 commented 1 year ago

Test with nps_replay will be done by casey. Will follow up once the test is done in ifarm.