E3SM-Project / ChemDyg

Chemistry Diagnostics Package
BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

Putting input data on the E3SM input data server #4

Closed tangq closed 1 year ago

tangq commented 1 year ago

This was discussed at the infrastructure group meeting and documented at this meeting notes page.

The suggestions are:

xylar commented 1 year ago

Yes, that all sounds exactly right.

I would just add:

xylar commented 1 year ago

I will also note that the files are available for public download here: https://web.lcrc.anl.gov/public/e3sm/diagnostics/observations/ So it is very important that you have permission to distribute the observations and that you include any license information that is required along with them.

If you have observations you wish to use internally in E3SM but are not willing or able to distribute publicly, you can put them in:

/lcrc/group/e3sm/diagnostics_private/observations

These also get synced with mache and again you should make them group readable and writable but they will not be available on the web server.

tangq commented 1 year ago

@xylar , can mache handle symbolic links? We'd like to use the links to point to different versions of input files without changing the source code.

xylar commented 1 year ago

@tangq, I think it will copy the file twice rather than copying the symlink. So if I'm right about that, you will only save disk space by symlinking on Chrysalis and Anvil. On other machines, you will have redundant copies. But that's not a huge problem unless you're talking about very large files.

tangq commented 1 year ago

@hsiangheleellnl and I just discussed about it. The main benefit of symlinks is that we can use the same input file names in the python scripts, so we don't need to update the code when updating the input data.

xylar commented 1 year ago

Yep, that's fine. I will also check if symlinks get preserved by my rsync commands between machines in case what I said above is not correct.

tangq commented 1 year ago

@hsiangheleellnl will upload the input files (with time stamps in the file names) to the LCRC data server and create symlinks there.

It sounds like the input data are rsynced by your script to E3SM machines. We can test that when @hsiangheleellnl uploaded the files.

xylar commented 1 year ago

@tangq, I looked at the mache code and it seems like I'm using the --link flag for rsync, which should keep symlinks the same on all the machines as they are on LCRC. So make sure they're relative-path links within diagnostics and they should work elsewhere.

tangq commented 1 year ago

@hsiangheleellnl uploaded the input files to the input data server at /lcrc/group/e3sm/diagnostics_private/observations/Atm/ChemDyg_inputs

hsiangheleellnl commented 1 year ago

@xylar I have a question about how to reset 'diagnostics_base_path'. The current setup is to indicate the path /lcrc/group/e3sm/diagnostics/observations/Atm, but we want to put the input data in the 'diagnostics_private'. How can I reset the link?

xylar commented 1 year ago

@hsiangheleellnl, that's a great question that has a bit of a complicated answer.

On LCRC there are 3 diagnostics directories:

/lcrc/group/e3sm/public_html/diagnostics/
/lcrc/group/e3sm/diagnostics_private/
/lcrc/group/e3sm/diagnostics/

The mache sync diags tool (part of E3SM-Unified from the mache package) us used to copy the diagnostics data from the first 2 directories into the 3rd one. This seems strange on LCRC but it's the equivalent procedure locally to what happens on all the other E3SM supported machines: we combine the public and private data so they're all in one place.

I will go ahead and run the mache sync diags tool on LCRC and your data should end up in the expected place. Are there other machines where you need the data right now as well? If so I'll sync those. Otherwise, I would wait a bit because I have other work that will require syncing that is waiting in the wings.

xylar commented 1 year ago

@tangq and @hsiangheleellnl, the diagnostics that you placed in diagnostics_private should now be synced to diagnostics. Let me know if you have any trouble.

tangq commented 1 year ago

Thank you, @xylar , for the elaborate reply. Now I have a better idea of the mache sync diags logic. I can see the data synced to diagnostics on LCRC. Can you run it for compy, where we may run chemistry tests due to the limited chrysalis scratch space?

I noticed that the diagnostics_private directory cannot be accessed from blues. I guess that's intentional.

xylar commented 1 year ago

I noticed that the diagnostics_private directory cannot be accessed from blues. I guess that's intentional.

I just logged onto blues and I was able to access it just fine:

$ pwd
/lcrc/group/e3sm/diagnostics_private
$ ls -lah
total 34K
drwxrws---    3 ac.xasay-davis E3SM 4.0K Nov 15 04:00 .
drwxrwsr-x+ 200 root           E3SM  16K Mar 23 10:50 ..
drwxrws---    4 ac.xasay-davis E3SM 4.0K Mar 16 12:35 observations

Access is the same from Anvil (Blues) and Chrysalis as far as I know. Could you check again?

xylar commented 1 year ago

Can you run it for compy, where we may run chemistry tests due to the limited chrysalis scratch space?

Sure, I synced to Compy.

tangq commented 1 year ago

I noticed that the diagnostics_private directory cannot be accessed from blues. I guess that's intentional.

I just logged onto blues and I was able to access it just fine:

$ pwd
/lcrc/group/e3sm/diagnostics_private
$ ls -lah
total 34K
drwxrws---    3 ac.xasay-davis E3SM 4.0K Nov 15 04:00 .
drwxrwsr-x+ 200 root           E3SM  16K Mar 23 10:50 ..
drwxrws---    4 ac.xasay-davis E3SM 4.0K Mar 16 12:35 observations

Access is the same from Anvil (Blues) and Chrysalis as far as I know. Could you check again?

I can access diagnostics_private from Blues now.

rljacob commented 1 year ago

If you need more scratch space on Chrysalis, use /gpfs/fs0/globalscratch