crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.36k stars 1.62k forks source link

Missing File::Info data. #8357

Open didactic-drunk opened 4 years ago

didactic-drunk commented 4 years ago

Crystal::System::FileInfo is missing public accessors. I'd like to create a PR to expose some or all of the information below.

Most of the data is cross platform with some exceptions. The list is not exhaustive.

Name Platforms Notes
atime POSIX, Windows
ctime POSIX, Windows
birth_time Dragonfly, FreeBSD, Linux, MacOS, Windows
ino POSIX, Windows*
dev POSIX, ? Is the volume serial number on Windows equivalent?
nlink POSIX, Windows
blocks POSIX Not technically required in the spec but implemented almost universally.
blksize POSIX Not technically required in the spec but implemented almost universally.
flags Linux, *BSD, MacOS, ?

Suggested names:

Currently I need ctime, birth_time, ino, dev, nlink, flags. I assume others may want the full stat structure when more non web applications are written.

ysbaddaden commented 4 years ago

What are your use cases?

didactic-drunk commented 4 years ago

Detecting hard links needs[ino, dev, nlink] and birth_time ideally. ctime as a fallback but I also need ctime when displaying file info.

Also getting or setting flags like APPEND.

RX14 commented 4 years ago

We specifically didn't expose ino and dev because we have same_file? which compares ino and dev, but doesn't expose these platform-specific details.

Exposing the number of links would be fine - with a usecase.

Exposing more times is interesting: many filesystems don't support the birth time, and there's discrepancies between windows and linux on ctime and atime, iirc. Go supports only modification time for this same reason.

Stat flags are already exposed.

The blocks and block size are useless on modern-day filesystems.

j8r commented 4 years ago

There is also st_size missing, to get the disk size.

ysbaddaden commented 4 years ago

File::Info#same_file? is nice, but maybe hard to find? There is the higher level File.same?(a, b) but I had to dig the docs to find it, and the documentation is lacking details. Just "compares the device and inode on UNIX to detect hard links" could be a nice addition.

I recently had a use case for atime: a disk cache with an automated cleanup of files not accessed since N days —mounted with relatime on Linux for the 24h granularity.

ctime is useful to detect that metadata changed, for example the file owner or permissions changed, but there was no writes so mtime didn't change; a backup application could use it.

Note that Go has a non-portable direct access to the stat struct through FileInfo Sys(). See https://stackoverflow.com/a/55303743/199791 for example. It's a nice solution. It only supports high level portable data, but allows applications to have very specific use cases for hardly portable data (e.g. atime on a relatime mounted device on a Linux target) that can be legitimate, but too specific to bother with platform agnostic methods.

j8r commented 4 years ago

A platform specific API should exist for this kind of non-portable, low-level operations, like Rust std::os::unix, and Go unix package. It may be kind of already present with src/crystal/system/unix and src/crystal/system/win32, but this files aren't meant to be used as-is.

didactic-drunk commented 4 years ago

Stat flags are already exposed.

No they aren't. Crystal split mode in to permissions and suid/sugid/sticky calling it flags.

I need what BSD systems refer to as flags or Linux as attributes (not to be confused with extended attributes). On BSD it's part of the stat structure in st_flags. On Linux it's available through one of the stat interfaces, but I don't remember how exactly.

https://en.wikipedia.org/wiki/Chattr

didactic-drunk commented 4 years ago

We specifically didn't expose ino and dev because we have same_file? which compares ino and dev, but doesn't expose these platform-specific details.

My laptop has 3 million files. How do I find the hard links? I can't keep 3 million files open. Comparing every file against every file is unacceptably slow so that won't work either. I also can't compare incrementally or between program runs by saving state. So no this doesn't work for my use cases at all. I've mentioned what I for porting one ruby program.

What is the solution? Either I make my own shard which monkey patches or duplicates the stat structure already available in src/crystal/system/unix and src/crystal/system/win32 or crystal exposes thing like atime/ctime/birth_time which may vary in their use between platforms.

mtime can vary. I've encountered NFS systems that return epoch mtime/ctime/atime for every file. Every reported time in the structure including mtime is not just platform specific but file system specific. Especially so when using network file systems.

I can handle the differences and expect to. On systems supporting birth time I use birth time (which includes Windows). Otherwise fall back to POSIX ctime and nlinks. Both solutions need dev and ino.

Additional use cases that I need dev and ino for:

Additional use cases:

Most of the flags could be handled by an enum. The ones I need are mostly portable like immutable and append. Linux version flag is an outlier with additional data. I have no need for it.

So how do I make this work in crystal considering when everything except birth_time is available in ruby and working for ~12 years?

RX14 commented 4 years ago

Note that Go has a non-portable direct access to the stat struct through FileInfo Sys().

Yeah, this is the solution I prefer too - just expose a stat struct with a info.platform_specific.foo.

No they aren't. Crystal split mode in to permissions and suid/sugid/sticky calling it flags.

My bad. Looks like this needs statx though. Just like birth_time. I don't think we should bind statx, or use statx by default on linux. A statx binding should be a shard, since the difference is only visible on the platfom-specific members.

My laptop has 3 million files. How do I find the hard links?

Thanks for the usecase! I agree, indexing hardlinks is impossible with same_file?. We should expose the platform-specific members.

didactic-drunk commented 4 years ago

Yeah, this is the solution I prefer too - just expose a stat struct with a info.platform_specific.foo.

So make info.stat public and document it's platform specific? That would solve almost every use case.

ysbaddaden commented 4 years ago

What about File::Info#raw? It would return @stat on UNIX and @file_attributes on Windows.

RX14 commented 4 years ago

I prefer platform_specific - but I'd rather ensure everyone agrees with this approach (:+1: / :-1: this comment) before bikeshedding on that.

asterite commented 4 years ago

No platform specific data, please. A Crystal program should compile and run exactly the same on all platforms. Or, said another way, the API should be exactly the same for all platforms. But it's fine if a method raises on one platform but works on another, given that it's documented to only work on certain platforms.

didactic-drunk commented 4 years ago

No platform specific data, please. A Crystal program should compile and run exactly the same on all platforms. Or, said another way, the API should be exactly the same for all platforms. But it's fine if a method raises on one platform but works on another, given that it's documented to only work on certain platforms.

Instead of handling it at compile time:

if stat.responds_to?(:birth_time)
  ...
else
  # ctime handler
end

I have to use exception handling at runtime:

begin
  stat.birth_time
rescue
  # ctime handler
end

@asterite How much overhead does that add when traversing file systems with > 10 million files? 100 million? They won't use spinning rust so seek times are not as much of a concern.

But it's not one exception handler. It's several. One for flags, another for acl's, another for resource forks, another for extended attributes plus anything I missed.

I'd much rather use feature checks than exception handling. To me the code is clearer. I know it's a platform feature check and only runs on specific platforms. With the exception handler am I handling an os error or the platform unsupported? It's even less clear when trying to understand someone else's code. What did they intend?

asterite commented 4 years ago

How much overhead does that add when traversing file systems with > 10 million files? 100 million?

Exception handling doesn't add overhead unless an exception is raised (as far as I know).

Additionally, if a method doesn't raise (the compiler knows which methods raise), and methods you want to use in an OS and they are available won't raise, then the compiler will skip the entire exception handler (or LLVM will do this). So zero overhead, really.

I'd much rather use feature checks than exception handling.

The problem is that if someone forgets to check for a feature flag in a library, nobody can use that library in some OS, even if the library never calls that code (for example if they call it conditionally at runtime). This was discussed in the past.

didactic-drunk commented 4 years ago

I'd much rather use feature checks than exception handling.

The problem is that if someone forgets to check for a feature flag in a library, nobody can use that library in some OS, even if the library never calls that code (for example if they call it conditionally at runtime). This was discussed in the past.

@asterite If they forget a rescue nobody can use that library on the different OS. How is that different?

If anything it's worse. A clear compile time error is changed in to a maybe run time error. Do the specs test that part of the code? If not the program appears to compile and function correctly but raises unhandled exceptions when run in the real world.

In both compile time feature checking and exception:

The difference is when the error occurs. Compile or runtime. Compile time is more robust.

didactic-drunk commented 4 years ago

Additionally, if a method doesn't raise (the compiler knows which methods raise), and methods you want to use in an OS and they are available won't raise, then the compiler will skip the entire exception handler (or LLVM will do this). So zero overhead, really.

@asterite No OS has all the features I'm checking. That means 2-3 exceptions on average, not zero overhead.

straight-shoota commented 4 years ago

When the methods raise on specific platforms, you can use conditional macro branches:

{% if flag?(:win32) %}
  stat.birth_time
{% else %}
  stat.ctime
{% end %}

This ensures only the non-raising methods are invoked on a platform.

Some of these features however are not even platform-specific but depend on the file system. In that case, the API should provide nilable getters to avoid exception overhead.

if birth_time = stat.birth_time?
  birth_time
elsif ctime = stat.ctime?
  ctime
end
didactic-drunk commented 4 years ago

When the methods raise on specific platforms, you can use conditional macro branches:

{% if flag?(:win32) %}
  stat.birth_time
{% else %}
  stat.ctime
{% end %}

This ensures only the non-raising methods are invoked on a platform.

But that's worse than a feature flag!

What's the point of raising an exception? Without rescue's the code won't run on unsupported platforms. Exactly the opposite of what @asterite claims the exceptions are for.

You also duplicated code. Crystal already has to know which platforms are supported or not in order to raise. Instead of if responds_to every developer needs to figure out which platforms are supported or not and use if flag :platform for every feature tested.

asterite commented 4 years ago

It would be intersting to know how this is solved in Go, Java and other languages.

ysbaddaden commented 4 years ago

@asterite as said above, Go has a platform agnostic interface (FileInfo) with the portable info that works everywhere the same, but also exposes the raw, system specific, info (FileInfo Sys()).

This is IMO an acceptable solution. We have a platform agnostic API that works everywhere for 99% of use cases, but allow the 1% remaining use cases to be implemented.

Also I prefer a program that won't compile on some platforms (I must deal with it), than a program that silently compiles but will raise NotImplementError exceptions at runtime (useless).

It could just be that some methods don't exist for some targets, but I think the Go way to handle FileInfo is better, and makes the distinction between (not) portable API.

didactic-drunk commented 4 years ago

Rather than do it the java/gofy way, I'd rather ask: "what would @asterite do?" @asterite had this great idea to take ruby and add nil checking. Maybe he had the answer.

So I crossed the vast ocean, climbed the highest mountian and slept with the ugliest of mountain goats (it was cold... and lonely).

He wasn't there.

So I used the internet to ask the great sage: "Is a method that may or may not be like a the nil problem you originally solved? Could you treat it exactly like the compile time nil checking already used?"

stat.ino => Compile error "Not available on all platforms.  Check with .responds_to?"
if stat.respond_to?(:birth_time)
  stat.birth_time
else
  stat.ctime
end

Side note, if you use lambskin they think it's another goat.

And in a booming voice the great sage @asterite squeeked:

asterite commented 4 years ago

My child.

You have come far in your journey and learned much. You have served our cause with the truest faith. Therefore I name you blessed and beloved.

The nilable approach suggested by @straight-shoota seems to be a good option: you ask it, but you have to check whether it's really supported. But maybe it depends on the API.

didactic-drunk commented 4 years ago

That's not quite what I had i mind but I'll take it.

It'd be nice to have a clear split between platform if's and runtime nil checks.

# baz may return nil
if foo = bar.baz
  ...
end

# baz may not exist depend on platform an OS version.
if foo = bar(.)(.)baz
  ...
end

The operator above is only an example. Feel free to change it. I prefer to check if the bar has (.)(.)'s.

didactic-drunk commented 4 years ago

I'd like to propose a standard annotation or other method to indicate platform variant behavior.

Advantages:

This should work regardless of whether using nil, responds_to?, or other methods.

HertzDevil commented 1 year ago

It looks like atime and birthtime are available on all supported platforms already?

EDIT: birth time is available via statx on Linux. Not sure if WebAssembly exposes the same thing. I couldn't find any references to file creation time in DragonFly BSD's libc.

On Windows ctime is accessible in the Win32 API via FILE_BASIC_INFO using GetFileInformationByHandleEx. Ruby is definitely incorrect here as ftCreationTime is the birth time. Python is correct.