datalad / datalad-fuse

DataLad extension to provide FUSE file system access
Other
1 stars 4 forks source link

Add option for the absent annexed files timestamp #88

Open yarikoptic opened 1 year ago

yarikoptic commented 1 year ago

Inspired by looking at https://github.com/datalad/datalad-fuse/pull/86/files#diff-9ecef4f4bd763d8738d14a47bafe14a54cb42520c194bdc09e41c913506c9061R163. ATM get_commit_datetime(path) returns the date for the last commit in the repository which contains path. Such lookup was a compromise between more specific date from the last commit which touched that path and some arbitrary date. But in some cases we want one of those as tradeoff of specificity and performance. I think we better expose those 3 modes via --path-timestamp with following possible values

I think we could even postpone implementing path-commit since it feels like a nice feature but there is no real need for it ATM. But with startup option we might be able to shave off some cycles while working on https://github.com/dandi/dandisets-healthstatus/ but as it should not be the default behavior -- it should be an option.

jwodder commented 1 year ago

@yarikoptic What exactly would the benefit of this be? How would using the fusefs creation time instead of the current time shave off any cycles?

yarikoptic commented 1 year ago

indeed, fusefs-startup indeed probably would not shave off much. I am not sure why I thought shaving off a single git command invocation would matter. Let's forget about that one (I will cross it out). But having path-commit (which would actually be slower) would be useful to possibly enable build systems (like make, scons) and workflow engines (like snakemake) to make use of the modification times for making their decisions on the need to rebuild etc. But implementation should be 'smart-ish' in avoiding really git log -1 for every file.

I thought to suggest --full-diff flag but I don't think it would give us needed effect. So we would really need to run/analyze git log --stat (or alike with list of files) until we get to the file needed, and then continue with analysis if some new file given did not get mtime yet.

yarikoptic commented 1 year ago

can you work it out from such description @jwodder?

yarikoptic commented 1 year ago

ping