Open didactic-drunk opened 4 years ago
What are your use cases?
Detecting hard links needs[ino, dev, nlink]
and birth_time
ideally. ctime
as a fallback but I also need ctime
when displaying file info.
Also getting or setting flags like APPEND
.
We specifically didn't expose ino
and dev
because we have same_file?
which compares ino
and dev
, but doesn't expose these platform-specific details.
Exposing the number of links would be fine - with a usecase.
Exposing more times is interesting: many filesystems don't support the birth time, and there's discrepancies between windows and linux on ctime and atime, iirc. Go supports only modification time for this same reason.
Stat flags are already exposed.
The blocks and block size are useless on modern-day filesystems.
There is also st_size
missing, to get the disk size.
File::Info#same_file?
is nice, but maybe hard to find? There is the higher level File.same?(a, b)
but I had to dig the docs to find it, and the documentation is lacking details. Just "compares the device and inode on UNIX to detect hard links" could be a nice addition.
I recently had a use case for atime
: a disk cache with an automated cleanup of files not accessed since N days —mounted with relatime
on Linux for the 24h granularity.
ctime
is useful to detect that metadata changed, for example the file owner or permissions changed, but there was no writes so mtime
didn't change; a backup application could use it.
Note that Go has a non-portable direct access to the stat
struct through FileInfo Sys()
. See https://stackoverflow.com/a/55303743/199791 for example. It's a nice solution. It only supports high level portable data, but allows applications to have very specific use cases for hardly portable data (e.g. atime
on a relatime
mounted device on a Linux target) that can be legitimate, but too specific to bother with platform agnostic methods.
A platform specific API should exist for this kind of non-portable, low-level operations, like Rust std::os::unix, and Go unix package.
It may be kind of already present with src/crystal/system/unix
and src/crystal/system/win32
, but this files aren't meant to be used as-is.
Stat flags are already exposed.
No they aren't. Crystal split mode
in to permissions
and suid/sugid/sticky calling it flags.
I need what BSD systems refer to as flags or Linux as attributes (not to be confused with extended attributes). On BSD it's part of the stat structure in st_flags
. On Linux it's available through one of the stat interfaces, but I don't remember how exactly.
We specifically didn't expose
ino
anddev
because we havesame_file?
which comparesino
anddev
, but doesn't expose these platform-specific details.
My laptop has 3 million files. How do I find the hard links? I can't keep 3 million files open. Comparing every file against every file is unacceptably slow so that won't work either. I also can't compare incrementally or between program runs by saving state. So no this doesn't work for my use cases at all. I've mentioned what I for porting one ruby program.
What is the solution? Either I make my own shard which monkey patches or duplicates the stat structure already available in src/crystal/system/unix
and src/crystal/system/win32
or crystal exposes thing like atime/ctime/birth_time
which may vary in their use between platforms.
mtime
can vary. I've encountered NFS systems that return epoch mtime/ctime/atime
for every file. Every reported time in the structure including mtime
is not just platform specific but file system specific. Especially so when using network file systems.
I can handle the differences and expect to. On systems supporting birth time I use birth time (which includes Windows). Otherwise fall back to POSIX ctime
and nlinks
. Both solutions need dev
and ino
.
Additional use cases that I need dev
and ino
for:
[dev, ino]
with [birth_time, ctime, nlink]
and between program runs.Additional use cases:
chflags/chattr
.Most of the flags could be handled by an enum. The ones I need are mostly portable like immutable
and append
. Linux version
flag is an outlier with additional data. I have no need for it.
So how do I make this work in crystal considering when everything except birth_time
is available in ruby and working for ~12 years?
Note that Go has a non-portable direct access to the
stat
struct throughFileInfo Sys()
.
Yeah, this is the solution I prefer too - just expose a stat struct with a info.platform_specific.foo
.
No they aren't. Crystal split
mode
in topermissions
and suid/sugid/sticky calling it flags.
My bad. Looks like this needs statx though. Just like birth_time
. I don't think we should bind statx, or use statx by default on linux. A statx binding should be a shard, since the difference is only visible on the platfom-specific members.
My laptop has 3 million files. How do I find the hard links?
Thanks for the usecase! I agree, indexing hardlinks is impossible with same_file?
. We should expose the platform-specific members.
Yeah, this is the solution I prefer too - just expose a stat struct with a info.platform_specific.foo.
So make info.stat
public and document it's platform specific? That would solve almost every use case.
What about File::Info#raw
? It would return @stat
on UNIX and @file_attributes
on Windows.
I prefer platform_specific
- but I'd rather ensure everyone agrees with this approach (:+1: / :-1: this comment) before bikeshedding on that.
No platform specific data, please. A Crystal program should compile and run exactly the same on all platforms. Or, said another way, the API should be exactly the same for all platforms. But it's fine if a method raises on one platform but works on another, given that it's documented to only work on certain platforms.
No platform specific data, please. A Crystal program should compile and run exactly the same on all platforms. Or, said another way, the API should be exactly the same for all platforms. But it's fine if a method raises on one platform but works on another, given that it's documented to only work on certain platforms.
Instead of handling it at compile time:
if stat.responds_to?(:birth_time)
...
else
# ctime handler
end
I have to use exception handling at runtime:
begin
stat.birth_time
rescue
# ctime handler
end
@asterite How much overhead does that add when traversing file systems with > 10 million files? 100 million? They won't use spinning rust so seek times are not as much of a concern.
But it's not one exception handler. It's several. One for flags, another for acl's, another for resource forks, another for extended attributes plus anything I missed.
I'd much rather use feature checks than exception handling. To me the code is clearer. I know it's a platform feature check and only runs on specific platforms. With the exception handler am I handling an os error or the platform unsupported? It's even less clear when trying to understand someone else's code. What did they intend?
How much overhead does that add when traversing file systems with > 10 million files? 100 million?
Exception handling doesn't add overhead unless an exception is raised (as far as I know).
Additionally, if a method doesn't raise (the compiler knows which methods raise), and methods you want to use in an OS and they are available won't raise, then the compiler will skip the entire exception handler (or LLVM will do this). So zero overhead, really.
I'd much rather use feature checks than exception handling.
The problem is that if someone forgets to check for a feature flag in a library, nobody can use that library in some OS, even if the library never calls that code (for example if they call it conditionally at runtime). This was discussed in the past.
I'd much rather use feature checks than exception handling.
The problem is that if someone forgets to check for a feature flag in a library, nobody can use that library in some OS, even if the library never calls that code (for example if they call it conditionally at runtime). This was discussed in the past.
@asterite If they forget a rescue
nobody can use that library on the different OS. How is that different?
If anything it's worse. A clear compile time error is changed in to a maybe run time error. Do the specs test that part of the code? If not the program appears to compile and function correctly but raises unhandled exceptions when run in the real world.
In both compile time feature checking and exception:
if responds_to?
vs rescue
The difference is when the error occurs. Compile or runtime. Compile time is more robust.
Additionally, if a method doesn't raise (the compiler knows which methods raise), and methods you want to use in an OS and they are available won't raise, then the compiler will skip the entire exception handler (or LLVM will do this). So zero overhead, really.
@asterite No OS has all the features I'm checking. That means 2-3 exceptions on average, not zero overhead.
When the methods raise on specific platforms, you can use conditional macro branches:
{% if flag?(:win32) %}
stat.birth_time
{% else %}
stat.ctime
{% end %}
This ensures only the non-raising methods are invoked on a platform.
Some of these features however are not even platform-specific but depend on the file system. In that case, the API should provide nilable getters to avoid exception overhead.
if birth_time = stat.birth_time?
birth_time
elsif ctime = stat.ctime?
ctime
end
When the methods raise on specific platforms, you can use conditional macro branches:
{% if flag?(:win32) %} stat.birth_time {% else %} stat.ctime {% end %}
This ensures only the non-raising methods are invoked on a platform.
But that's worse than a feature flag!
What's the point of raising an exception? Without rescue
's the code won't run on unsupported platforms. Exactly the opposite of what @asterite claims the exceptions are for.
You also duplicated code. Crystal already has to know which platforms are supported or not in order to raise
. Instead of if responds_to
every developer needs to figure out which platforms are supported or not and use if flag :platform
for every feature tested.
It would be intersting to know how this is solved in Go, Java and other languages.
@asterite as said above, Go has a platform agnostic interface (FileInfo) with the portable info that works everywhere the same, but also exposes the raw, system specific, info (FileInfo Sys()).
This is IMO an acceptable solution. We have a platform agnostic API that works everywhere for 99% of use cases, but allow the 1% remaining use cases to be implemented.
Also I prefer a program that won't compile on some platforms (I must deal with it), than a program that silently compiles but will raise NotImplementError exceptions at runtime (useless).
It could just be that some methods don't exist for some targets, but I think the Go way to handle FileInfo is better, and makes the distinction between (not) portable API.
Rather than do it the java/gofy way, I'd rather ask: "what would @asterite do?" @asterite had this great idea to take ruby and add nil
checking. Maybe he had the answer.
So I crossed the vast ocean, climbed the highest mountian and slept with the ugliest of mountain goats (it was cold... and lonely).
He wasn't there.
So I used the internet to ask the great sage: "Is a method that may or may not be like a the nil
problem you originally solved? Could you treat it exactly like the compile time nil checking already used?"
stat.ino => Compile error "Not available on all platforms. Check with .responds_to?"
if stat.respond_to?(:birth_time)
stat.birth_time
else
stat.ctime
end
Side note, if you use lambskin they think it's another goat.
And in a booming voice the great sage @asterite squeeked:
My child.
You have come far in your journey and learned much. You have served our cause with the truest faith. Therefore I name you blessed and beloved.
The nilable approach suggested by @straight-shoota seems to be a good option: you ask it, but you have to check whether it's really supported. But maybe it depends on the API.
That's not quite what I had i mind but I'll take it.
It'd be nice to have a clear split between platform if's and runtime nil
checks.
# baz may return nil
if foo = bar.baz
...
end
# baz may not exist depend on platform an OS version.
if foo = bar(.)(.)baz
...
end
The operator above is only an example. Feel free to change it. I prefer to check if the bar has (.)(.)'s.
I'd like to propose a standard annotation or other method to indicate platform variant behavior.
Advantages:
This should work regardless of whether using nil
, responds_to?
, or other methods.
It looks like atime and birthtime are available on all supported platforms already?
EDIT: birth time is available via statx
on Linux. Not sure if WebAssembly exposes the same thing. I couldn't find any references to file creation time in DragonFly BSD's libc.
On Windows ctime is accessible in the Win32 API via FILE_BASIC_INFO
using GetFileInformationByHandleEx
. Ruby is definitely incorrect here as ftCreationTime
is the birth time. Python is correct.
Crystal::System::FileInfo
is missing public accessors. I'd like to create a PR to expose some or all of the information below.Most of the data is cross platform with some exceptions. The list is not exhaustive.
Suggested names:
Currently I need ctime, birth_time, ino, dev, nlink, flags. I assume others may want the full stat structure when more non web applications are written.