R3BRootGroup / R3BRoot

Framework for Simulations and Data Analysis of R3B Experiment
https://github.com/R3BRootGroup/R3BRoot/wiki
GNU General Public License v3.0
18 stars 104 forks source link

Added new repository for lmds and analysis CI tests #862

Closed jose-luis-rs closed 1 year ago

jose-luis-rs commented 1 year ago

This PR is to include new tests for checking the unpacker stage of our R3B experiments.


Checklist:

YanzhaoW commented 1 year ago

Hi, @jose-luis-rs

Thanks for the PR. But I have two concerns:

  1. Is it necessary to have test scripts in another repo?

I would rather like to just keep them here since managing versions of both two repos is a bit headache. As indicated by @inkdot7 in PR #840, if both repos require changes to make a test pass, there isn't a very clear way to make good CI workflows for the PRs made in these two repos.

  1. Is it good idea to store the test data in a github repo?

If I'm not wrong, there is a 10 MB size limit for every file in a Github repo. The test data used in this PR isn't very big. But we can't be so sure whether we need a bit larger test data in the future. Maybe we could store our test data in an online data server ( such as Zenodo with 50 GB size limit) and download the data directly in the CI workflow?

jose-luis-rs commented 1 year ago

Hi, @jose-luis-rs

Thanks for the PR. But I have two concerns:

1. Is it necessary to have test scripts in another repo?

No, we can create a specific folder in r3broot for this. At the moment it is for a basic test to look for problems and you can see them with the failing tests.

I would rather like to just keep them here since managing versions of both two repos is a bit headache. As indicated by @inkdot7 in PR #840, if both repos require changes to make a test pass, there isn't a very clear way to make good CI workflows for the PRs made in these two repos.

2. Is it good idea to store the test data in a github repo?

If I'm not wrong, there is a 10 MB size limit for every file in a Github repo. The test data used in this PR isn't very big. But we can't be so sure whether we need a bit larger test data in the future. Maybe we could store our test data in an online data server ( such as Zenodo with 50 GB size limit) and download the data directly in the CI workflow?

You can create a Zenodo server for this, no problem!

YanzhaoW commented 1 year ago

No, we can create a specific folder in r3broot for this. At the moment it is for a basic test to look for problems and you can see them with the failing tests.

Ok, but is the unpacker test detector specific? If so, it should be in the test folder of the detector directory.

You can create a Zenodo server for this, no problem!

Ok, I will do this for Neuland tests using my own Zenodo account when I have time (after Budapest). Tests of other detectors could choose their own ways.

But how do you create those lmd files? Is it a copy of some experimental raw data or generated by ucesb?

jose-luis-rs commented 1 year ago

No, we can create a specific folder in r3broot for this. At the moment it is for a basic test to look for problems and you can see them with the failing tests.

Ok, but is the unpacker test detector specific? If so, it should be in the test folder of the detector directory.

No, my macro is for a specific experiment, in particular, S515. This is to check the common unpacking and later different detector correlations.

You can create a Zenodo server for this, no problem!

Ok, I will do this for Neuland tests using my own Zenodo account when I have time (after Budapest). Tests of other detectors could choose their own ways.

But how do you create those lmd files? Is it a copy of some experimental raw data or generated by ucesb?

You can use ucesb or specific unpackers, for my example, I used the upexps unpacker 202104_s515 with the command:

This produces an output with the name "file_name.lmd" containing 10000 events.

jose-luis-rs commented 1 year ago

No, we can create a specific folder in r3broot for this. At the moment it is for a basic test to look for problems and you can see them with the failing tests.

Ok, but is the unpacker test detector specific? If so, it should be in the test folder of the detector directory.

You can create a Zenodo server for this, no problem!

Ok, I will do this for Neuland tests using my own Zenodo account when I have time (after Budapest). Tests of other detectors could choose their own ways.

Thanks @YanzhaoW

YanzhaoW commented 1 year ago

Ok, after a little bit research, I think the standard ways to store raw data (possible large size) could be:

  1. DVC
  2. Git-lfs.

Both can use a GSI server as the host for data storage.

klenze commented 1 year ago

I do not think it is worthwhile to put LMD files under version control, as their content never changes after they are recorded.

Putting test lmd files into a directory within https://webdocs.gsi.de/~land/ and using cutting edge http clients like wget or curl to fetch them inside the CI container should work just fine.

jose-luis-rs commented 1 year ago

I do not think it is worthwhile to put LMD files under version control, as their content never changes after they are recorded.

Putting test lmd files into a directory within https://webdocs.gsi.de/~land/ and using cutting edge http clients like wget or curl to fetch them inside the CI container should work just fine.

Yes, that could be another solution for the lmd files, but what could we do with upexps?

klenze commented 1 year ago

The master repo of upexps is here. Unfortunately, this requires credentials to read. (Also, changes tend to accumulate in the upexps repositories on the land account.) From my understanding, the reason for that is that the licensing situation is a bit unclear. Different people contributed and would have to be consulted if we were to publish it under GPL. (I say just publish the thing without any licence terms. Back in the good old days nobody gave a damn. Nobody is likely to incorporate our spec files into Linux or Apache in any case, and the possibility that someone pulls an SCO on GSI for distributing upexps seems distant at best.)

If we can positively not distribute the sources of upexps, then there is the option of distributing the output files from it instead of lmds. From my understanding, the normal communication between unpacker and r3bsource happens through a pipe, so one could simply put a file in the middle.

This still sucks a bit because these files would have to be regenerated (and retested against R3BRoot) when upexps changes.

@bl0x @inkdot7 Thoughts?

inkdot7 commented 1 year ago

(I say just publish the thing without any licence terms. Back in the good old days nobody gave a damn. Nobody is likely to incorporate our spec files into Linux or Apache in any case, and the possibility that someone pulls an SCO on GSI for distributing upexps seems distant at best.)

Something published without a license is not usable, since it is not known how it can be redistributed. Thus cannot be used as a basis for further work.

klenze commented 1 year ago

@inkdot7: In a world with copyright law, a lack of license terms will negate GNU freedom two (redistribution) and freedom three (redistribution of modified versions). Technically, running (freedom 0) and modifying (freedom 1) may also be negated but I see no practical way for the copyright holder to enforce that.

I would much prefer if we could stick licence terms on upexps, be they GPL, BSD, or public domain. To my knowledge the reason we don't just do that are copyright concerns.

I see four ways forward: (0) We figure out who all the contributors were and get them to okay distribution under GPL. (1) We do a clean room reimplementation of upexps: someone documents the data formats from our spec files (which are not copyrightable), then someone else reimplements those those in ucesb. This is of course a waste of time, but we could just stick the GPL on the result. (2) We release the source as-is without licence, passing the buck to the users. (3) We do not release anything.

Any external user would prefer 0 or 1 over the other options, obviously.

But I would argue that option two is strictly more useful than our current choice, which is option 3. If nothing else, it would enable external users to do the clean room reimplementation themselves (if they want to use our spec files as the basis for a larger project where they need legal certainty).