ben-denham / labtech

Easily run experiment permutations with multi-processing and caching
https://ben-denham.github.io/labtech/
GNU General Public License v3.0
7 stars 1 forks source link

Add unit tests for storage #18

Closed nathanjmcdougall closed 4 months ago

nathanjmcdougall commented 5 months ago

@ben-denham I have an edge case that I don't know how to test; where someone uses relative paths in a key.

I'm a bit concerned this means that a carefully crafted custom key could end up running rmtree on an arbitrary directory.

Might be best to forbid . characters in keys.

ben-denham commented 4 months ago

@ben-denham I have an edge case that I don't know how to test; where someone uses relative paths in a key.

I'm a bit concerned this means that a carefully crafted custom key could end up running rmtree on an arbitrary directory.

Might be best to forbid . characters in keys.

Good question, I'm pretty confident the check on key_path.parent would prevent any dangerous deletions, but to be even safer I think it would be reasonable to just disallow ., /, \, and os.path.sep in all keys (and update the docs to note that KEY_PREFIX in a custom cache should not include these characters).

nathanjmcdougall commented 4 months ago

@ben-denham Some thoughts:

1) It looks like #15 has introduced an issue with .gitignore is detected as a key.

2) If we forbid os.path.sep from keys entirely (rather than just from the start or end, which is the current behaviour) then it prevents the possibility to sort keys within subfolders of the lab storage folder. I think having the ability to have a subfolder structured lab folder is probably a good thing

3) I think having . in keys forbidden entirely is a bit excessive. Sometimes . is useful, e.g. if you want to reference a floating point number e.g. mykey3.10. The issue is when . is being used to construct a relative path.

4) If we go the path of making keys very "safe' and minimal it might be easiest just to force them to be alphanumeric.

ben-denham commented 4 months ago

Thanks @nathanjmcdougall

@ben-denham Some thoughts:

  1. It looks like Feature: Automatically add .gitignore into a storage directory folder #15 has introduced an issue with .gitignore is detected as a key.

Thanks for catching this, I've fixed it in b3e97df

  1. If we forbid os.path.sep from keys entirely (rather than just from the start or end, which is the current behaviour) then it prevents the possibility to sort keys within subfolders of the lab storage folder. I think having the ability to have a subfolder structured lab folder is probably a good thing

Fair point, although the current key_path.parent check will prevent subfolders currently, so it would need to be changed to check that the storage directory is one of the parents of key_path.

  1. I think having . in keys forbidden entirely is a bit excessive. Sometimes . is useful, e.g. if you want to reference a floating point number e.g. mykey3.10. The issue is when . is being used to construct a relative path.
  2. If we go the path of making keys very "safe' and minimal it might be easiest just to force them to be alphanumeric.

Ok, so to allow dots and slashes, maybe we should "resolve" the path (3 times, each time treating /, \, or os.path.sep as the separator) and ensure that the "resolved" path is identical to the original path - combined with the check on the parent of the path, that will prevent any traversal outside of the directory, and ensure we don't have any key collisions caused by two keys that resolve to the same path.

nathanjmcdougall commented 4 months ago

@ben-denham to get this branch closed, I have just done the simple thing and have forbidden ., /, \, and os.path.sep, as well as os.path.altsep.

I figure that there can be a separate exercise to support unusual key types at a later date, if required.

ben-denham commented 4 months ago

Thanks @nathanjmcdougall , that looks great. Could you please do a rebase on top of the latest main branch, as it looks like you've got some duplicate commits in your branch at the moment: