glotzerlab / signac

Manage large and heterogeneous data spaces on the file system.
https://signac.io/
BSD 3-Clause "New" or "Revised" License
129 stars 36 forks source link

Requiring workspace to be a subdirectory of the project #390

Closed vyasr closed 2 years ago

vyasr commented 4 years ago

I know we've had discussions in the past on tightening the relationship between a project and its workspace. There have been arguments on both sides for either leaving the relationship very loose, or making it even more tightly coupled by requiring that the workspace be contained in the project folder. I don't remember all the arguments either way, but if we need to change anything then the 2.0 release seems like a reasonable point to start imposing that restriction since I think the code changes would be minimal but it would be a breaking change. Whether we choose to make the change or not, I think we should resolve this issue one way or another prior to the 2.0 release.

@csadorf I believe you had particular opinions on this. I can update the issue if/when I remember what our thoughts were.

vyasr commented 3 years ago

Can the workspace be a symlink? Allowing that is very convenient for the purpose of working with scratch spaces on supercomputers, for instance, but it also means that functions like signac.get_job can't guarantee that they'll work.

bdice commented 3 years ago

Yes, signac.get_job will error if it cannot find a project (or finds an invalid project, e.g. one that is several directories too high and doesn't have its workspace configured to point to the directory containing the job). I think the error handling in signac.get_job is sufficient if we choose to leave the behavior as-is.

To me, the bigger question is whether the workspace should be configurable at all, which is discussed in #197. I think that if the workspace remains configurable (as in v1), a symlink workspace should be valid. That should behave similarly to setting the workspace to the symlink's target. If the workspace is not configurable, then I think disallowing symlinks would be acceptable.

vyasr commented 3 years ago

I don't think my question was clear. I don't think the workspace should be configurable, as discussed on #197 and in the title of this issue. The workspace should always be an immediate subdirectory of the project called workspace in signac 2.0. However, unless you actually verify that it's not a symlink, nothing will stop the user from symlinking the workspace to somewhere else. If we don't actually check that explicitly (via not os.path.islink or similar), the symlink will work almost all the time, until you try something like signac.get_job. My question is whether we just have those cases fail if people do that, or if we try to prevent people from at least by documenting that they shouldn't use symlinks (even if we don't check for it). It sounds like you're fine with just letting it fail in such cases?

bdice commented 3 years ago

@vyasr Got it. I see no reason to explicitly prohibit symlinks, we can document the expectations and then it's the user's fault if signac.get_job fails.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.