Netatalk / netatalk

Netatalk is a Free and Open Source AFP fileserver. A *NIX or BSD system running Netatalk is capable of serving many Macintosh clients simultaneously as an AppleShare file server.
https://netatalk.io
GNU General Public License v2.0
330 stars 85 forks source link

Support remapping the \r in Icon\r to something that can be named in .hidden files #1380

Open ssokolow opened 1 month ago

ssokolow commented 1 month ago

CLARIFYING EDIT: Every time I write \r, I mean a literal 0x0D byte (A.K.A. ^M in Vim), not the string \r.

Is your feature request related to a problem? Please describe.

Currently, my local Linux file manager's view of /srv/retro is cluttered up with Icon empty document entries that I can't get rid of because either the local equivalent to Samba's hide files = ... option (which is putting a .hidden file inside the folder with one filename per line) or Dolphin's parsing of it wasn't designed with filenames like Icon\r in mind. (Neither Icon? nor Icon\r nor Icon<literal CR> work and I don't want to try Icon*.)

Describe the solution you'd like

Given that the trailing \r is probably going to be eaten by every "just do what I mean" implementation of line-splitting under the sun, including ones built into some programming languages, I think it would be an endless slog to identify and fix every implementer of .hidden that doesn't special-case Icon\r, so I think it needs to be fixed on the Netatalk side.

The only option I can think of which wouldn't break Samba interop (In fact, it would fix it for those who want that. More on that later.) would be to add a config file option that allows remapping \r to something else.

(I'd go with generic support for remapping a list of Unicode code points to another list of Unicode code points since that feels like it'd be the best balance of concerns.)

Now about how it would fix Samba interop...

I noticed that I was able to set up my /srv/retro with per-platform icons like this:

At first, I just assumed that OSX had changed how it stored the icon resources (inside .DS_Store?) when using the SMB client... until I noticed that I had two un-hidden files in the folder... Icon\r and Icon.

I didn't see any option to turn it off in the smb.conf docs and I honestly like being able to give different icons to PPC-era and Intel-era Mac OS, but either OSX or Samba apparently does remap \r to when storing Icon\r over SMB.

The important thing is that I can name files containing in .hidden, so, aside from inspiring this solution, we're back to "Netatalk's representation of Icon\r is the only thing I can't hide using .hidden".

(At the moment, my solution has been to just sudo chattr +i **/Icon$'\r' **/._Icon$'\r' to make sure I don't accidentally delete them from the Linux side of my usually-read only = yes /srv/retro share. ...there's also some interaction between the different heterogeneous clients that I haven't tracked down the source of yet that occasionally causes the Icon\r files to become un-invisibled, but praise-be-to-ad set -f V, I just have a cleanup.sh script which recursively hides all the Icon\r and Icon files and the share is normally read-only anyway.)

Describe alternatives you've considered

  1. You could specifically remap just Icon\r, but it feels inelegant and potentially vulnerable to "You did that. Why can't you do this too?" scope creep.
  2. You could implement a generic filename remapping option but that also feel inelegant and prone to users using it to stick dots on names like TheVolumeSettingsFolder, which I assume would break things like Samba.
  3. You could implement a generic solution for automatically remapping the Invisible attribute to a prepended dot and searching for a dotfile too when the client asks for a file without a prepended dot... but I don't think I need to see how the complexity of addressing all the edge cases would spiral.

Additional context

In case you aren't sold on the importance of retaining the ability to keep Netatalk's Icon\r separate from SMB's Icon, have a screenshot of my in-progress icon set for my /srv/retro:

(Yeah. I haven't started on the 10.4 icons yet, since I'm not yet tooled up for a streamlined workflow on perspective transforms, so I'm just using a couple of standard icons to demonstrate the principle, and you'll just have to take my word for it that Dolphin isn't just honoring desktop.ini's icon assignments until I get around to making variants to fit with KDE's Breeze theme.)

preview

NJRoadfan commented 1 month ago

How do you have extended attributes setup on the netatalk side? Is it set ea=auto or ea=samba? The latter is supposed to improved compatibility with Samba's vfs_stream_xattr, but the docs are a bit thin otherwise. The other issue is that Samba has seen continuous development and might have had changes that broke Netatalk 3.x compatibility over the years. Here are additional tips on setting up Samba with Netatalk 3.x: https://www.samba.org/samba/docs/current/man-html/vfs_fruit.8.html

ssokolow commented 1 month ago
[DEFAULT]
; Make extended attributes compatible
ea = samba

...but if that's supposed to collapse Icon\r and Icon together, maybe I should change it so things don't break when I upgrade and it gets fixed.

EDIT: ...on the other hand, I remember the manpage saying all it's supposed to do is append a null byte to each xattr for compatibility with Samba's implementation.

ssokolow commented 1 month ago

...and as a reminder, this isn't a feature request to make OSX AFP and OSX SMB share the same icons. It's a feature request to make Netatalk reproduce mac filenames less faithfully for better compatibility with the .hidden feature of Linux file managers.

ssokolow commented 1 month ago

The other issue is that Samba has seen continuous development and might have had changes that broke Netatalk 3.x compatibility over the years. Here are additional tips on setting up Samba with Netatalk 3.x: https://www.samba.org/samba/docs/current/man-html/vfs_fruit.8.html

OK, I found an answer on that front. The thing I need to stay far away from if I want to ensure I can have separate custom folder icons for AFP and SMB is fruit:encoding = native.

Still, even if I wanted fruit:encoding = native, all that would do is make it so that OSX over SMB requires the requested feature too.

NJRoadfan commented 1 month ago

OK, I think I know what is happening. Somewhere Netatalk is converting a literal 0x0D to \r when storing the file. The code likely assumes that 0x0D is not a valid character in a filename (generally a smart thing to do) and converts it to an escaped equivalent.

ssokolow commented 1 month ago

No, it's writing a literal 0x0D to disk... Dolphin just parses .hidden in a way that means a file ending in a literal 0x0D will never match its line in that file.

If it were writing the string \r to disk then I wouldn't need the requested feature because a literal Icon\r in .hidden and a literal Icon\r in a filename would match each other.

ssokolow commented 1 month ago

The problem is that the modern way to hide non-dotfiles in file managers on XDG desktops (i.e. Linux, *BSD) is a newline delimited list and "universal" newline splitting algorithms interpret the trailing carriage return as a delimiter instead of part of the filename.

Think trying to represent a filename containing a comma in a variant of CSV that doesn't support quoting or escaping.

If it were literally any byte other than 0x0d or 0x0a, there would be no problem.

Depending on how such a splitting algorithm is implemented, it may even only take issue with terminal carriage returns. (eg. I've seen ones where "universal" just means "DOS or UNIX" and they just implemented something like lines = [x.rstrip('\r') for x in raw.split('\n')])

NJRoadfan commented 1 month ago

MacOS X natively stores the icon files on HFS+ drives using 0x0D, so its no surprise that Netatalk replicates that. What is writing an Icon file with a literal \r on the end then? Samba? I can't see that being the default though since the backslash is a big no-no on Windows systems.

ssokolow commented 1 month ago

Nothing is writing an Icon with a literal \r on the end.

The feature request is for support for translating Icon<0x0D> (A.K.A. Icon^M) into something else (like Samba does by default) because file managers parsing the .hidden file interpret a trailing CR followed by the intended LF delimiter as a stray DOS-style CRLF line ending and normalize it away. (Which then means an Icon != Icon<0x0D> result and no hiding of the Icon<0x0D> file.)

Trailing carriage return bytes are literally unrepresentable in a "Newline-delimited list, using DOS or UNIX line endings" file unless the parser is smart enough to follow Vim's stateful approach where only the first line's delimiter is heuristically detected and all following lines are assumed to use the same delimiter type.

ssokolow commented 1 month ago

...and yes, I checked. Dolphin's parser is both universal enough to recognize classic mac-style CR-only line endings and stateless enough to allow you to mix different kinds in the same file, so it'll parse this .hidden file the same way whether using DOS, UNIX, or Mac line endings...

Desktop.ini⏎
DESKTOP.INI⏎
Icon⏎
Icon<0x0D>⏎

...like this:

Desktop.ini⏎
DESKTOP.INI⏎
Icon⏎
Icon⏎
⏎
NJRoadfan commented 1 month ago

OK, so you are manually created/named a file named Icon\r and Samba converts that to Icon<0x0D> when reading and delivering the directory entry to a SMB client?

Netatalk 2.x used to support encoding illegal characters using CAP encoding (Icon:0d in your case), but that appears to have went away with 3.x. The code for this still appears in the tree though. I don't know if it would even work in this case since Icon<0x0D> is legal on *NIX platforms.

ssokolow commented 1 month ago

I'm manually creating a file named Icon<0x0D>... by Using Command-C and Command-V between two Get Info dialogs in Mac OS 9.2.2 and Mac OS 10.4 to set a custom icon on a folder.

Finder stores custom folder icons inside the resource fork of an invisible file named Icon<0x0D> inside the folder in question... but Mac OS hides it using the HFS Invisible flag, while Linux/BSD file managers are completely ignorant of the user.org.netatalk.Metadata xa that Netatalk uses to store that.

This feels like I'm being faulted for "allowing" Windows Explorer to splash Thumbs.db files all over the place.

EDIT: The difference is, I can add a line containing Thumbs.db to a text file named .hidden inside the folder and the file manager will then treat it as if it were named .Thumbs.db instead... but .hidden is newline-delimited and the file manager's parser interprets Icon<0x0D> as Icon followed by a delimiter because <0x0D> is a valid Macintosh line delimiter and <0x0D> followed by a standard UNIX line delimiter is how you write a standard DOS/Windows line delimiter.

ssokolow commented 1 month ago

To make it clear, Netatalk is faithfully reproducing the filename used by Finder's behind-the-scenes mechanism for custom folder icons... and that's the problem.

What I'm asking for is a solution that involves patching Netatalk once instead of tracking down and writing a PR for every single Linux file manager that implements support for the the .hidden file, both now and in the future, to smarten up its line-splitting algorithm and extend/add the regression test so that these dozens of file managers recognize that a CRLF in an otherwise LF-delimited .hidden file is probably a Netatalk-created file rather than a stray DOS/Windows line delimiter.

That latter approach would be about as viable as getting every Linux and BSD file manager in existence to add support for parsing the HFS Invisible flag out of user.org.netatalk.Metadata and honoring it... and it still wouldn't fix it for cases where your .hidden file contains exactly one entry and it's Icon<0x0D> because that'd be indistinguishable from a one-entry CRLF-delimited file.

EDIT: And why did Apple use that name to begin with? It's as if Microsoft chose to name desktop.ini as desktop.ini<CR><LF> or if XDG decided the standard name for .directory (The Linux/BSD analogue to desktop.ini) was .directory<LF>.

NJRoadfan commented 1 month ago

Just a shot in the dark, the wildcard 'Icon'$'\r' doesn't work? Just noticed that when doing an ls of a directory. Using CAP notation does NOT work for this use case unfortunately.

ssokolow commented 1 month ago

$'\r' is bash/zsh shell syntax. If .hidden doesn't support Icon? (which belongs to the simplest common subset of shell globs), then it's not going to support something that's effectively an embedded subset of shell scripting itself.

(Icon? and Icon* are available to every programming language, essentially for free, via the glob and fnmatch functions in the C standard library, which can be assumed to be present by any portable POSIX application because every Unixoid platform except Linux has ABI-unstable kernel syscalls and treats its libc as the ABI stability boundary.)

ssokolow commented 1 month ago

Yeah. I just checked. Dolphin is literally just delegating to the behaviour imposed by Qt's QIODevice::Text flag when you call QTextStream::readLine(), which is documented as:

When reading, the end-of-line terminators are translated to '\n'. When writing, the end-of-line terminators are translated to the local encoding, for example '\r\n' for Win32.

EDIT: And there's no spec for .hidden. It's literally just "Some GNOME person banged this together for Nautilus and people using other file managers are asking for us to have it too. Let's just copy the 'one file per line' high-level description and call it a de facto standard."

ssokolow commented 1 month ago

PCManFM (the GTK-based version) also interprets 0x0D as a line terminator but invokes the "newline inside filename" rendering behaviour... making Finder's custom folder icons even uglier on Linux. I believe that's how all GTK-based things which don't implement .hidden yet will show it.

Screenshot_20240809_110810

(Ignore the two Blank DVD+R Disc entries in the sidebar. It's my workaround for a decade-long, across-multiple-systems bug I don't know how to narrow down enough to report where something in Linux's stack causes "On second though, veto that tray open" commands to pile up in optical drives if you leave the tray empty.)

EDIT: Oh, and no. That's not an uncharacteristically old version of it. I'm running Kubuntu 22.04 LTS, so that would have been current at the time of feature freeze.

ssokolow commented 1 month ago

...and, as I suspected for something that doesn't have a clearly defined spec (.hidden), there's no consistency on "Why would anyone sane ever do that?" edge-cases like Finder's decision to put a Macintosh newline at the end of a filename.

I can't think of any other general-purpose file-view implementations that I have installed. (eg. Geeqie only lists directories and files with known-supported image extensions and wxWidgets just delegates to GTK.)

In summary, for any folder where someone uses Finder to set a custom icon, so long as Netatalk doesn't support translating the <0x0D> to something else, Linux users will see the following results:

Basically, as-is, the only way for Macintosh-over-Netatalk and Linux/BSD users to coexist comfortably on the same Linux/BSD filesystem is to either forbid the Macintosh users from setting custom icons (eg. via an inotify hook to delete them as soon as they get set) or to force the Linux/BSD users to access it via Samba so veto files can be used to hide them from non-Netatalk clients... or to write a FUSE proxy filesystem which does something similar to the hidden virtual .zfs folder where snapshots live on ZFS filesystems, where Icon<0x0D> doesn't get returned by opendir/readdir but fopening it will succeed.

Other non-dotfile entries like TheVolumeSettingsFolder only show up in the root of the Netatalk share, so they can be ignored in the few applications that don't honor .hidden, same as the lost+found folder used by fsck, but Icon<0x0D> shows up in every single folder with a custom icon and, without an equivalent to FAT/NTFS Hidden attributes or HFS/HFS+ Invisible flags, there's no way to reliably hide something with such an edge-case filename.

NJRoadfan commented 1 month ago

The behavior of using CAP style encoding (storing the filename as Icon:0d on the Linux side) can likely be patched in. I would not lean to that being the default behavior for creation of this file, as it would break Samba inter-op. The changes would have to be made in charcnv.c. This may cause trouble in other areas of the CNID code though as only / is currently treated as a special case in this matter.

EDIT: Scratch that. Netatalk 3.x doesn't support CAP encoding of filenames anymore as it was removed with this commit: https://github.com/Netatalk/netatalk/commit/f03f4b3ee3b8c423f1b48e3fd5a226db95ce428f

ssokolow commented 1 month ago

Bear in mind that Samba interop is already broken by default because Samba will do its own analogue to CAP style encoding by default.

(Users must opt into preferrring interop with Netatalk at the cost of breaking interop with Linux GUI file managers by adding vfs_fruit and setting fruit:encoding = native... and I'd been using Samba for 20 years without ever discovering that vfs_fruit existed.)

NJRoadfan commented 1 month ago

To be fair, vfs_fruit didn't exist 20 years ago. I think Apple added their protocol extensions after they switched from Samba to their in-house SMB implementation, which was around the time they started deprecating AFP.

ssokolow commented 1 month ago

Did the "Samba remaps characters like <0x0D> unless you add a VFS filter to override it" behaviour come later?

...because FAT, exFAT, the Win32 personality of NTFS, and DOS, Win16, and Win32 APIs were forbidding all characters in the 0x1..=0x1f (Rust syntax) range within filenames for the entire lifespan of AFP and it wouldn't make sense for them to start doing a remapping that's clearly for the benefit of being able to manipulate the files through Windows Explorer only after vfs_fruit added an option to turn it off.

NJRoadfan commented 1 month ago

I don't know for sure as I haven't followed Samba development. Those characters were likely prohibited at the SMB protocol level and Apple did filename mangling on the client side when working with shares that do not support the extensions. FWIW, Windows 11 doesn't seem to care. MacOS seemingly writes the Icon files as Icon<0x0D>. Windows stores the image data in ADSes. Explorer shows the file as Icon, so it appears to do the same private area translation of invalid characters as Samba does. The file attributes are set to 'hidden'.

ssokolow commented 1 month ago

So why is it Samba's responsibility to go out of its way to to provide a non-default vfs_fruit option that writes them in a form that breaks the only mechanism for hiding non-dotfiles in Linux/BSD GUI file managers instead of Netatalk's responsibility to translate them into a form compatible with Samba as well as any program which goes the default route of using the line-splitting/newline-trimming logic in programming languages like Python or Rust and libraries like Qt?

(Even if, personally, I'd prefer the option to translate it differently so I can keep that accidental "different icons for AFP and SMB" feature while also having working .hidden.)

...I suppose I could try spending an afternoon writing an applefix FUSE proxy filesystem and adjust my afp.conf to point to /srv/.retro_applefix instead of /srv/retro... but I shouldn't need to.

NJRoadfan commented 1 month ago

At one point, Samba and Netatalk were under the same management team. Being able to export the same share via multiple protocols is desirable, so some coordination on how filenames and metadata were stored on the host file system was needed. Being able to store a filename as close to the original requested name is always the desired outcome. Since Netatalk was always UNIX based, it tends to be very flexible with filename storage. It didn't have to bend backwards in this case, so it didn't.

ssokolow commented 1 month ago

And yet, by Finder's weird naming choice and Netatalk adopting a "No hidden/invisible filesystem attribute? Not my problem." design, the emergent result is a worst-case for UNIX/Unixoid platforms that aren't accessing it through something like Samba's veto files.

If Samba bends over backwards for Windows that far, isn't it only fair that Netatalk, make a small nod to address a bug that emerges from an odd Macintosh design choice (Icon<0x0D>) slamming into UNX↔Windows interop ("Universal newline parsing") now that Unixy platforms are finally gaining a way to hide files without renaming them?

It really does feel unreasonable that I need either a network filesystem or a FUSE filesystem to both have custom folder icons in /srv/retro on Macintosh and not have them cluttering up my file manager on Linux, just because Netatalk adopts a posture I'd characterize as "rude" in the Jargon File, Definition 3 sense.

  1. Anything that manipulates a shared resource without regard for its other users in such a way as to cause a (non-fatal) problem. Examples: programs that change tty modes without resetting them on exit, or windowing programs that keep forcing themselves to the top of the window stack.
rdmark commented 1 month ago

@ssokolow I appreciate all the know-how and research you're sharing in this thread. If I could ask you one favor: Please keep a positive and constructive tone in your messages. The arguably poor design decisions in Netatalk are at least two decades old, and if I read @NJRoadfan's intentions correctly, he is describing the current state rather than defending it.

I'm personally not ruling out changing Netatalk's filename mangling behavior. Obviously, a change in such a core part of the application will require careful coding and thorough testing. The absolutely best way to get traction here, would be for you to fork Netatalk, do the requisite code changes, and file a PR back to the project so that we can proceed with code review & testing. We seriously consider all code contributions that adhere to the coding guidelines.

Cheers!

ssokolow commented 1 month ago

Sorry. I guess my frustration led me to slip on evaluating my phrasing.

As for forking and PRs, unfortunately, I don't trust myself to write in a memory-unsafe language for anything long-running, exposed to the network, or more complex than a little MS-DOS (or, when I can make time to resume learning, Classic Macintosh) utility and Netatalk is all three... especially when I'm currently struggling with the effects of bad sleep habits and am more dependant than ever on the Rust compiler to catch my mistakes.

"Careful coding and thorough testing" is the last thing I trust myself to do at this point in time.

rdmark commented 1 month ago

No worries; thanks for being open to constructive criticism. :)

Doing any kind of substantive change to this C codebase is absolutely terrifying for all of us, with 0% unit test coverage and complex code paths all over the place. But we have at least SonarCloud static analysis and cross-platform CI builds (and human code reviews) to protect us against some of the more obvious bugs. If you ever change your mind, we'll be awaiting your contribution eagerly.

BTW, I have only cursory understanding of Rust, but I wonder how memory safety would be achieved for a multi-process / multi-threaded application like Netatalk? How can the compiler anticipate all potential states?

ssokolow commented 1 month ago

BTW, I have only cursory understanding of Rust, but I wonder how memory safety would be achieved for a multi-process / multi-threaded application like Netatalk? How can the compiler anticipate all potential states?

It's basically the same sort of situation as asking how a type system like C's can anticipate everything. You make certain unlikely-to-be-correct programs (eg. storing integers in two registers and then performing FADD on them without first translating them from integer form to floating-point form) more difficult in exchange for making testing the correctness of some property of the vast majority of correct programs tractable.

In Rust's case, it's mostly a superset of what's considered good practice in C++ these days, but built into the design of the language and standard library APIs so that you don't have the off-putting degree of annotation clutter and drudgework that would be involved in retrofitting a C codebase with something like splint.

There are things where you can't express them in "safe Rust"... but that's why the unsafe keyword exists to grant localized access to things like dereferencing raw pointers so you can build manually-audited, correct-by-construction abstractions (this is how things like Vec<T> in the standard library and safe FFI bindings are built)

For example...

Common to all architectures:

Multi-threading:

EDIT: For multi-process, there's a limit to how much any one language can do beyond the process-internal things, but there is the typestate pattern, which Rust's de facto standard HTTP implementation is using to good effect, Rust's proven itself good for parsing and serialization/deserialization tasks (See Serde for Rust's de facto standard framework for that, as well as building blocks that make things more comfortable like bytemuck and byteorder), and it just generally helps if you can trust you need to spend less of your energy scrutinizing other aspects of the code because the compiler is watching your back.

Granted, it's not a panacea, but there's a reason a lot of people have described it as "makes programming fun again". (eg. leaking memory is safe and there's even an API for it (just call mem::forget on something with a heap allocation), because that's no more dangerous than what you can do with a list() in Python or an Array in JavaScript and it's a very difficult thing to prove at compile time. If you're using shared memory instead of message passing and you interact with multiple locking primitives at once, Rust won't magically remove the need to know and apply a solution to the dining philosophers problem. The public/private boundary is the module, so don't assume that auditing just the lines in your unsafe blocks is enough, async/await comes across as surprisingly skill-demanding compared to the rest of the language, etc.)

...and there is one place where unsafe is more dangerous than C for someone with C intuition and that's that you can't just perform operations on raw pointers without knowing which ones will create a temporary reference (&/&mut) because you're still subject to the rules for never aliasing references. (Basically, Rust makes liberal use of the LLVM IR construct that C's restrict translates to. Miri will tell you if you got it wrong. When in doubt, use std::ptr functions.)

As someone who avoids unsafe if at all possible (#![forbid(unsafe_code)] at the top of the source file that defines the root of the crate (library)), the main flaw of Rust I run into is that they did a bit too good a job of making costs explicit, so Rust has a tendency to lure you into premature optimization.

rdmark commented 1 month ago

Thank you for the Rust crash course. I read through your points and while some of it goes over my head I can see how the structures and safeguards allow for safer coding with the tradeoff of some added complexity in the language itself, and less "freedom" as it were. The reason Rust is on my radar recently, is because DARPA's TRACTOR project was in the news, which promises accurate and safe translation of C to "idiomatic Rust". I believe it when I see it, but it might be an interesting experiment to do on the Netatalk codebase in the future.

Anyhow, let me put a pin in the Rust discussion in this thread. Something for another day and another venue!

VorpalBlade commented 1 month ago

The reason Rust is on my radar recently, is because DARPA's TRACTOR project was in the news, which promises accurate and safe translation of C to "idiomatic Rust".

As a long time professional C++ developer, and for the past two years Rust programmer and Rust evangelist at my dayjob, I would suggest to not get overhyped on that DARPA thing.

Some people on the Rust user language forum did some digging and it is at this stage not even a "we want you to send in research proposal on this for us to grant you money", it is "we think this is a good idea, and we want do a meeting to discuss this and what future research proposal frameworks and guidelines might look like in this area, if it is at all possible". You have to click through a few steps to find out this (special notice -> find the PDF link).

I think it will be incredibly difficult to do this automatically (if it is possible at all). I don't buy LLMs managing this. They can often manage short pieces of code (tens to maybe a hundred lines), and may and may not introduce bugs because they are LLMs. But maybe, just maybe there is a chance. You can after all already do this to incredibly non-idiomatic code using c2rust, but the result will be full of unsafe blocks and raw pointer manipulation, it is meant to be a first step on a manual translation process.