geodesymiami / rsmas_insar

RSMAS InSAR code
https://rsmas-insar.readthedocs.io/
GNU General Public License v3.0
60 stars 23 forks source link

Advice from TACC for Stampede2 #429

Closed falkamelung closed 3 years ago

falkamelung commented 3 years ago

Here some comments I got after being locked out because of accidental system abuse.

In regard to the scripts on the logins, the key is they should NOT run at 100% CPU time for extended periods, something that takes an hour to download will not consume very much CPU time if only transferring and not processing the data.

Short scripts of 20-30 seconds are okay, but they can't run repeatedly for hours, they should be intermittent and not processing data on the logins. Transfers and job monitoring scripts should be lightweight and not checking constantly (no reason to watch or check queues every 10 seconds, once a minute is more than enough and once every 5-10 minutes preferred)"

The most important aspect is that you be a nice citizen and not impact other users on the logins. I must remind you that logins have 100s of user sessions on them at any given time and if all ran scripts like those, the logins would never be usable. So the only fair thing is not to allow anyone to run on the logins as has been our policy for many years.

Never use the computes for any transfers, they only have GigE and will be much slower than logins. Preferred option is data transfer nodes and gridftp/Globus, but they may not be able to use that with where the data comes from. If you can't use gridftp, then logins is the next best option.