HERA-Team / librarian

The HERA Librarian.
BSD 2-Clause "Simplified" License
6 stars 11 forks source link

Uploading a directory not containing a '.' misidentifies filetype #69

Open BrianJKoopman opened 4 years ago

BrianJKoopman commented 4 years ago

I've been exploring using the librarian for the Simons Obsevatory and have come across a couple of things in testing that I thought I'd bring up.

If I upload a directory to the librarian, and name it such that it doesn't contain a '.', then the reported type will be a string that contains the staging directory that the librarian made and the full name of my directory. For instance:

$ librarian upload --null-obsid TestUser test_dir/ foo/test_dir

This successfully uploads, with the resulting entry:

Name Created Observation Type Source Size
test_dir   details 2020-02-03 19:38:39 null szAmKD/test_dir TestUser 50 Bytes

I would have expected the type to be reported as a "directory", though see the type check is entirely dependent on the extension. Based on the test cases it seems directories uploaded by HERA might always contain some sort of extension. Is this naming a requirement for other expected behavior?

pkgw commented 4 years ago

I can't think of a way that the detailed structure of a file's basename matters except for determining the file type and a minor bit of HERA-specific MJD sniffing functionality. I think it can be useful in a system like the Librarian to use extensions even for directory names, but we definitely don't want to make that a requirement and the current behavior is clearly not good.

I'm not sure if I'd recommend defaulting to a filetype of "directory" if there's no extension — it would be probably be a rarer situation but people might want to add single files whose basenames don't include extensions either. In these cases I'd lean towards just setting the filetype to the empty string.

BrianJKoopman commented 4 years ago

I'm not sure if I'd recommend defaulting to a filetype of "directory" if there's no extension — it would be probably be a rarer situation but people might want to add single files whose basenames don't include extensions either. In these cases I'd lean towards just setting the filetype to the empty string.

Yeah, agreed that "directory" as the default for extensionless files doesn't sound good. The solution I came up with in the moment (https://github.com/BrianJKoopman/librarian/commit/d9ca347075e9b409c915ef85519b67efb071233a) would check extensionless files for whether they are directory before marking it as such. Another check would be needed to properly mark extensionless files (that aren't directories) with an "unknown" or empty type. I'm happy to add that with another test and PR if you'd like.