Interlisp / medley

The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources)
https://Interlisp.org
MIT License
369 stars 19 forks source link

file names and case handling #657

Closed masinter closed 1 year ago

masinter commented 2 years ago

Originally posted by @masinter in https://github.com/Interlisp/medley/issues/651#issuecomment-1023748346 when you add a feature in sources (core ) and lispusers that aren't in the loadup depends on that feature?

I'm also worried about introducing bugs that only happen with case sensitive file systems. Mac and windows are case insensitive but linux and WSL are case sensitive.

I've noticed that IL:DIRECTORY returns all upcase filenames but INFILEP returns case preserved names. There were problems with matching lower case font filenames -- I suspect FONTSAVAILABLE had the problem.

Common Lisp uses another datatype (pathname) instead of Interlisp's strings, although there seems to be some coercion.

TEDIT won't take a string as a file name -- it treats it as the string to edit.

probably 3 separate issues

rmkaplan commented 2 years ago

I don’t think there were/are clear specifications for all of these interfaces, particularly with respect to case and directory brackets, what coercions are allowed and what are not. For example, it seems that <> and / can flip back and forth, and maybe that’s harmless, but case doesn’t always do that. A minor step towards consistency would be to add a bit to the file device that says whether its names are case-sensitive or not, and that would allow at least some generic code to operate uniformly.

I made a wistful comment earlier about how some finite-state transducer technology might be useful for file-name specification, parsing, and transformation. I have code for the full suite of FST algorithms, but way too much to include in the system. However: I also have code for compiling transducers down into very small arrays that can be interpreted by very small subroutines, the subroutines that we were running in watch and calculator chips in the spell-checking days. So it could be set up as an offline process for compiling a 2-tape regular expression for each filedevice. That would produce little byte-arrays that could be loaded in like bitmaps and incorporated as part of the FDEV.

Just a thought.

On Jan 27, 2022, at 3:55 PM, Larry Masinter @.***> wrote:

Originally posted by @masinter https://github.com/masinter in #651 (comment) https://github.com/Interlisp/medley/pull/651#issuecomment-1023748346 when you add a feature in sources (core ) and lispusers that aren't in the loadup depends on that feature?

I'm also worried about introducing bugs that only happen with case sensitive file systems. Mac and windows are case insensitive but linux and WSL are case sensitive.

I've noticed that IL:DIRECTORY returns all upcase filenames but INFILEP returns case preserved names. There were problems with matching lower case font filenames -- I suspect FONTSAVAILABLE had the problem.

Common Lisp uses another datatype (pathname) instead of Interlisp's strings, although there seems to be some coercion.

TEDIT won't take a string as a file name -- it treats it as the string to edit.

probably 3 separate issues

— Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/657, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSTUJOSTCV7AD65NPZJI73UYHLQTANCNFSM5M7LD5EQ. You are receiving this because you are subscribed to this thread.

masinter commented 2 years ago

Unfortunately, the linux model is that each file system can choose whether it is case sensitive or not, and file system boundaries are not particularly evident. So I have (in WSL) /home/larry/winhome as a symbolic link to /mnt/c/Users/larry/. My linux file system (which has /home/larry/winhome) is case sensitive. but the file system mounted at /mnt/c/ is case insensitive.

Git is case sensitive. Medley {DSK} emulates a case insensitive versioned file system even if the file system actually is case sensitive and without versions but the emulation is incomplete.

/mnt/c/Users/Larry/home/ilisp is case sensitive for the "/mnt/c" part and case insensitive at the "Users/Larry/home/ilisp" parts .

rmkaplan commented 2 years ago

Do those show up in Medley as different filedevices/hosts? (Or even with the new package, pseudohosts?)

On Jan 27, 2022, at 5:47 PM, Larry Masinter @.***> wrote:

Unfortunately, the linux model is that each file system can choose whether it is case sensitive or not, and file system boundaries are not particularly evident. So I have (in WSL) /home/larry/winhome as a symbolic link to /mnt/c/Users/larry/. My linux file system (which has /home/larry/winhome) is case sensitive. but the file system mounted at /mnt/c/ is case insensitive.

Git is case sensitive. Medley {DSK} emulates a case insensitive versioned file system even if the file system actually is case sensitive and without versions.

/mnt/c/Users/Larry/home/ilisp is case sensitive for the "/mnt/c" part and case insensitive at the "Users/Larry/home/ilisp" parts .

— Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/657#issuecomment-1023807835, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSTUJJSFO2H2GD2AIVK6QDUYHYTRANCNFSM5M7LD5EQ. You are receiving this because you commented.

nbriggs commented 2 years ago

In Larry's case, with WSL, at each directory along the path to a file you could find a different choice for whether the contents are case-sensitive or not. Yuck!

rmkaplan commented 2 years ago

How is that specified?

On Jan 27, 2022, at 5:53 PM, Nick Briggs @.***> wrote:

In Larry's case, with WSL, at each directory along the path to a file you could find a different choice for whether the contents are case-sensitive or not. Yuck!

— Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/657#issuecomment-1023810571, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSTUJIRMMRAJWY6Y4LW6ATUYHZIZANCNFSM5M7LD5EQ. You are receiving this because you commented.

nbriggs commented 2 years ago

It's a feature of the underlying file system when you created it, so when you mount file systems at different places in the hierarchy you'll get whatever behavior the mounted piece has. macOS has the same possibility -- the file system types are documented here: https://support.apple.com/lv-lv/guide/disk-utility/dsku19ed921c/mac

rmkaplan commented 2 years ago

At this level it seems that file systems map on to our notion of host/device. Does Maiko provide an interface that makes the properties of a particular file system accessible to Medley?

In the windows case, if each directory on a path can have different naming conventions, that doesn’t fit into our usual set up. I don’t know whether the pseudohost prefix scheme can be extended to deal with that.

On Jan 27, 2022, at 6:07 PM, Nick Briggs @.***> wrote:

It's a feature of the underlying file system when you created it, so when you mount file systems at different places in the hierarchy you'll get whatever behavior the mounted piece has. macOS has the same possibility -- the file system types are documented here: https://support.apple.com/lv-lv/guide/disk-utility/dsku19ed921c/mac https://support.apple.com/lv-lv/guide/disk-utility/dsku19ed921c/mac — Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/657#issuecomment-1023817041, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSTUJJCM33RUXNQ37WYW6LUYH275ANCNFSM5M7LD5EQ. You are receiving this because you commented.

nbriggs commented 2 years ago

They sort of map onto host/device, though when you take a sysout that already has a {DSK} FDEV instantiated and start it up where there's a different type of file system you'd have some trickiness -- if the {DSK} FDEV wasn't being insulated from all that by the underlying system code.

Maiko does not expose those features of the underlying system to Medley -- it fakes up a case-insensitive versioned file system for the {DSK} FDEV and pretty much passes through whatever is underneath for the {UNIX} FDEV.

Windows isn't the only system with this issue -- trivial case on macOS: you have /Volumes/KINGSTON mounted (say it's a USB thumbdrive; formatted for MS-DOS): /Volumes/KINGSTON is actually case insensitive, but /Volumes/KINGSTON/XXX.TXT is a different file from /Volumes/KINGSTON/xxx.txt because it's a FAT (or ExFAT) file system on the thumbdrive that you mounted.

rmkaplan commented 2 years ago

Is there an eventfn that reconfigures, say, the DSK filedevice when it comes up on a new platform?

On Jan 27, 2022, at 10:42 PM, Nick Briggs @.***> wrote:

They sort of map onto host/device, though when you take a sysout that already has a {DSK} FDEV instantiated and start it up where there's a different type of file system you'd have some trickiness -- if the {DSK} FDEV wasn't being insulated from all that by the underlying system code.

Maiko does not expose those features of the underlying system to Medley -- it fakes up a case-insensitive versioned file system for the {DSK} FDEV and pretty much passes through whatever is underneath for the {UNIX} FDEV.

Windows isn't the only system with this issue -- trivial case on macOS: you have /Volumes/KINGSTON mounted (say it's a USB thumbdrive; formatted for MS-DOS): /Volumes/KINGSTON is actually case insensitive, but /Volumes/KINGSTON/XXX.TXT is a different file from /Volumes/KINGSTON/xxx.txt because it's a FAT (or ExFAT) file system on the thumbdrive that you mounted.

— Reply to this email directly, view it on GitHub https://github.com/Interlisp/medley/issues/657#issuecomment-1023931445, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSTUJL7KCCZAJ5EJVVZC4TUYI3FNANCNFSM5M7LD5EQ. You are receiving this because you commented.

masinter commented 2 years ago

This was discussed in issue #265

rmkaplan commented 2 years ago

"TEDIT won't take a string as a file name -- it treats it as the string to edit."

I can't find any documentation of this "feature", maybe it seemed like a nice idea in the days before strings were accepted as filenames. But all it does is initialize the Tedit with the string, editing doesn't change the string itself, you have to write the result to a file.

I don't see anywhere in the system that this feature is exploited, so I proposed to remove it. I could provide another entry point (TEDITSTRING string) defined as (PRIN3 string (TEXTSTREAM (TEDIT))) to be used instead if we ever run into any string calls.

masinter commented 1 year ago

closing, see #265