Closed ncw closed 8 months ago
I don't love method 2, in that you need to typecast to get it to work. I get it, sometimes you need to do such things, but within a squashfs, I would think you shouldn't need to do so.
Method 1 is pretty good; the amount of information you are carrying around is tiny. I might be worried if we were doing this on real-time systems, but who runs go on microcontrollers?
Method 3 is most like os.Readlink()
, and so remains consistent with the go way of doing things (even if sometimes that go way is less efficient).
an efficient way of finding a file other than iterating all the directories
And what happens when you try to get the FileInfo
for method 1 or directory entry for method 2? You still have to walk the directories. The issue is that now you have to do it twice: once to get to the point that you know it is a symlink, and a second time to read the symlink. That part does bother me. I suspect go did it that way because not everything has symlinks, so they wanted it out of band.
I think method 1 is your best bet. You already are there, you walked the tree, and the amount of additional information is tiny. I could be convinced otherwise, though.
I don't love method 2, in that you need to typecast to get it to work. I get it, sometimes you need to do such things, but within a squashfs, I would think you shouldn't need to do so.
I'll just note that you already need a typecast to turn the result of the Sys()
call into the FileStat
.
Method 1 is pretty good; the amount of information you are carrying around is tiny. I might be worried if we were doing this on real-time systems, but who runs go on microcontrollers?
Method 3 is most like
os.Readlink()
, and so remains consistent with the go way of doing things (even if sometimes that go way is less efficient).
Just an idea...The FileState
could just contain a pointer to the directoryEntry
...
In fact the FileStat
could be an alias of directoryEntry
(you can even make it backwards compatible with type aliases) and then we'd just add new methods to directoryEntry
and they would appear on the FileStat
also.
So
type FileStat = directoryEntry
Then move the Uid
Gid
and Xattrs
methods onto dirEntry
. This then makes adding Readlink
quite natural and implements methods 1 and 2!
an efficient way of finding a file other than iterating all the directories
And what happens when you try to get the
FileInfo
for method 1 or directory entry for method 2? You still have to walk the directories. The issue is that now you have to do it twice: once to get to the point that you know it is a symlink, and a second time to read the symlink. That part does bother me. I suspect go did it that way because not everything has symlinks, so they wanted it out of band.
I think they were just following the unix syscalls which is readlink
.
This is quite annoying though in general. From having implemented this many times in rclone I think the best interface is something like this:
// Node represents either a directory (*Dir) or a file (*File)
type Node interface {
os.FileInfo
IsFile() bool
Inode() uint64
SetModTime(modTime time.Time) error
Sync() error
Remove() error
RemoveAll() error
DirEntry() fs.DirEntry
VFS() *VFS
Open(flags int) (Handle, error)
Truncate(size int64) error
Path() string
SetSys(interface{})
}
(With the addition of ReadLink!)
From having implemented this many times in rclone
Oh rclone is really nice. I didn't know that was you. I have some other projects that are in a similar space, but not really competitive. We should have a real-world discussion. Send me a Twitter DM @avideitcher or just @ me on one and we can then connect via email. If you are interested, of course.
Do you remember the seminal "rsync for backup" paper, by Mike Rubel, all those years ago? So much can be done with the basic structures at hand.
In fact the FileStat could be an alias of directoryEntry (you can even make it backwards compatible with type aliases) and then we'd just add new methods to directoryEntry and they would appear on the FileStat also.
Ooh, this is interesting. As long as it implements the right interface, why not? I rather like that. Does it work?
Since we already are talking about these, this was part of the work I did for a company early this year.
Thanks for merging this. I'll send the next part shortly which implements Open
- this helps a lot in the performance stakes.
Oh rclone is really nice. I didn't know that was you. I have some other projects that are in a similar space, but not really competitive. We should have a real-world discussion. Send me a Twitter DM @avideitcher or just @ me on one and we can then connect via email. If you are interested, of course.
You can find me at nick@craig-wood.com also @njcw on twitter (I have followed you!). Yes I agree we have lots of interests in common and a discussion would be great!
Do you remember the seminal "rsync for backup" paper, by Mike Rubel, all those years ago? So much can be done with the basic structures at hand.
I don't think I saw that, but I'm a big fan of rsync! The text UI of rclone is modeled around rsync and the philosophy is very similar. The only bit rclone doesn't implement is the very clever delta encoding transfer, but there aren't many cloud providers which do that!
Since we already are talking about these, this was part of the work I did for a company early this year.
Oh nice!
I'm currently working on an archive
backend for rclone which enables reading of archives like .zip and .squashfs - that is why I've been so interested in squashfs.
You can see the code so far here
v1.66.0-beta.7581.1cee1e101.archive-backend on branch archive-backend
To tell rclone you want to look at an archive you do rclone ls :archive:/path/to/archive.sqfs
To tell rclone you want to look at an archive you do rclone ls :archive:/path/to/archive.sqfs
Sometimes I think 50% of the things I get involved with in one way or another are, "how do we encode something that is kind of like a URL in a single string?" 😆
Will email you.
I'd like to be able to read the target of symlinks.
I can think of three methods to do this. My favourite is method 2 I think.
I'm happy to make a PR for whichever method we decide is best.
method 1
One way of doing this would be to add a
Readlink() (string, error)
method toFileStat
. This would return an error for non symlinks.https://github.com/diskfs/go-diskfs/blob/3fa702b44f0b367810bee40d609b71ebcabd7713/filesystem/squashfs/directoryentry.go#L8-L13
This could be implemented by adding a
target
field and setting it herehttps://github.com/diskfs/go-diskfs/blob/3fa702b44f0b367810bee40d609b71ebcabd7713/filesystem/squashfs/squashfs.go#L449-L452
method 2
An alternative would be to add a new method to
directoryEntry
(which is passed to the caller as anos.FileInfo
), sayReadlink() (string, error)
which you'd have to type assert to have access to. This would have the advantage that we don't need to put stuff in theFileStat
the user isn't going to use.https://github.com/diskfs/go-diskfs/blob/3fa702b44f0b367810bee40d609b71ebcabd7713/filesystem/squashfs/directoryentry.go#L49-L59
If you like this concept then we could potentially implement an
OpenFile
method too which would save a lot of directory traversals when reading files when it is very likely you've just listed the directory and have aFileInfo
to hand.method 3
A third method would be to implement
FileSystem.Readlink
to mimicos.Readlink
. I like this method the least though because squashfs doesn't have an efficient way of finding a file other than iterating all the directories.