SGCI / sgci-resource-inventory

This contains all the computational resource entities
https://sgci-resource-inventory.readthedocs.io/en/latest/introduction.html
Apache License 2.0
6 stars 2 forks source link

Accounting for paths in fileSystemDefinition where username is not appended #2

Open ericfranz opened 3 years ago

ericfranz commented 3 years ago

At PSC on bridges my home directory is /home/efranz. The fileSystemDefinition supports this well. I set "homeDir" : "/home" and then the gateway can append the username to get the user's home directory.

This pattern is not used at OSC, nor is it used for scratch directories at PSC. At both institutions, the path is appended with GROUP/USERNAME where group corresponds to the charge account for project or grant. So at PSC my scratch space is /pylon5/sy560jp/efranz for the one project/grant I am apart of. At OSC, I have 18 groups/grants (PZS0714, PAS1429, PAS1694) etc. so at OSC a scratch space directory I have access to is /fs/ess/scratch/PZS0714/efranz and corresponding project space is /fs/ess/PZS0714/efranz. (I'm assuming that project space at OSC would correspond to "archiveDir" in the resource inventory.)

If the purpose of the fileSystemDefinition is so that gateways can determine the appropriate paths for a particular user, how can we accommodate these use cases?

ericfranz commented 3 years ago

For paths or any URI where we would like to have a template string in the schema, we could use URI templates as defined by RFC 6570 since a system path is a type of URI, and libraries in most languages will exist to support this standard (i.e. https://pypi.org/project/uritemplate/ https://rubygems.org/gems/addressable).

The remaining challenge would be to identify what would be allowable variables to be used in these templates. And that problem is made more complex when you have situations like where a project or grant or other accounting_id is a supplemental group name for a user that matches a certain naming convention (as at OSC projects are groups that start with "P" but there are many other supplemental groups a user may be part of like I mention above).

Also I guess context also matters. In OnDemand I might expect {username} to mean the HPC account being used but is that the case in every context? The schema would have to explain the meaning of each variable so that different gateways substitute the correct values in when expanding the template into a URI reference.

Finally, if these paths are recongized as URIs (where the prefix assumed is file:// so that "homeDir" : "/home" is understood to be equivalent to "homeDir" : "file:///home") then that means these paths should follow URI rules for escaping certain characters. For example, a space in a file path would be inserted in the string as %20 instead of as by whoever is maintaining the json file.

Is the end goal for for the fileSystemDefinition that any gateway that might use this resource definition to know where to read and write user data, and to hopefully do that in a generic way that multiple gateways might benefit from a single definition that also defines the paths for multiple users?

ericfranz commented 3 years ago

These paths are not helpful when trying to identify scratch, project, etc. directories for a given user or given user+job on a particular resource. We discussed possibly specifying environment variables instead which may be relevant for a batch job.

We will wait till we have clarified requirements and use cases.

joestubbs commented 3 years ago

One idea considered was to specify the environment variables that could be used to determine the paths. These are commonly variables such as $HOME, $SCRATCH, $WORK, etc., but not always.

joestubbs commented 3 years ago

We discussed evaluating the use cases of gateway applications (and application descriptions) before making a final decision on the approach.

joestubbs commented 2 years ago

This is a good issue to revisit for 1.1