daokoder / dao-modules

Dao Standard Modules
http://daovm.net
12 stars 5 forks source link

os.fs MAX_PATH issues - discussion #67

Closed dumblob closed 9 years ago

dumblob commented 9 years ago

I've come across the limitation which MAX_PATH imposes. On Windows it's about 260 bytes (in wcs it's half, which is quite common nowadays), on GNU Linux it's usually 4096 (PATH_MAX) or sometimes half or quarter. Our Dao built-in fallback is 512.

The API in os.fs is enough high-level, that I'm certain, we should support bigger sizes than 130 on Windows. And this bigger size should be the same for all our supported systems. While looking for some solution, I stumbled upon a really sad discussion on https://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx#maxpath and short explanation on http://stackoverflow.com/questions/833291/is-there-an-equivalent-to-winapis-max-path-under-linux-unix and also a "solution" https://msdn.microsoft.com/en-us/library/930f87yf.aspx .

I'd like to achieve two goals:

  1. being able to use Dao with nearly all the existing file-dir-tree structure on a given system (imagine the huge java fully qualified paths having hundreds of characters :( in the real filesystem path)
  2. being able to develop a new program using os.fs on one platform (e.g. UNIX-like one) and be absolutely sure, that one can just take the source code and run it on e.g. Windows without the need to test path lengths on Windows (i.e. make the Windows limits high enough and at the same time lower the limits on other systems Dao supports to make them compatible)
dumblob commented 9 years ago

And there is also this beast FILENAME_MAX defined in the C standard as macro with absolutely variable size. PATH_MAX is on the other hand POSIX.

Night-walker commented 9 years ago

The API in os.fs is enough high-level, that I'm certain, we should support bigger sizes than 130 on Windows.

Actually, the limit is in characters, so it's still 260.

being able to use Dao with nearly all the existing file-dir-tree structure on a given system (imagine the huge java fully qualified paths having hundreds of characters :( in the real filesystem path)

On Windows, that is only possible with magic long paths, but I'm not sure that the POSIX-like layer used in os.fs actually supports them.

being able to develop a new program using os.fs on one platform (e.g. UNIX-like one) and be absolutely sure, that one can just take the source code and run it on e.g. Windows without the need to test path lengths on Windows (i.e. make the Windows limits high enough and at the same time lower the limits on other systems Dao supports to make them compatible)

You can't change actual Windows limits, the related API simply won't work.

dumblob commented 9 years ago

Actually, the limit is in characters, so it's still 260.

Maybe I understood it wrongly, but still 260 feels too few. With regards to this, our fallback should not be 512, but rather 260 * 6 = 1560 (6B is the widest UTF-8 character; or maybe 260 * 4 = 1040, because wcs characters are always encoded on less or equal 4 bytes - in this case though there might be issues with 5B/6B -wide characters transfered from UNIX system to Windows because of mapping).

On Windows, that is only possible with magic long paths, but I'm not sure that the POSIX-like layer used in os.fs actually supports them.

POSIX doesn't support anything like "magic long paths", because there is no need for that. Why not to use these "magic long paths" on Windows under the hood?

You can't change actual Windows limits, the related API simply won't work.

I meant changing the os.fs backend in a way, that we gain support for longer-than-260-characters path lengths on Windows. And in conjunction with that assuring, that other systems will have the same maximum path length restriction.

Or at least we should provide means to check whether particular string can be used as path with the guarantee, that it will work seamlessly on all Dao-supported platforms. In this case, we could leave the default maximum lengths platform-specific (i.e. FILENAME_MAX), but we would need to somehow come up with and hardcode the maximum multiplatform length of a path (measured in characters) to check against in run-time.

Night-walker commented 9 years ago

Maybe I understood it wrongly, but still 260 feels too few. With regards to this, our fallback should not be 512, but rather 260 * 6 = 1560 (6B is the widest UTF-8 character; or maybe 260 * 4 = 1040, because wcs characters are always encoded on less or equal 4 bytes - in this case though there might be issues with 5B/6B -wide characters transfered from UNIX system to Windows because of mapping).

The path limit is in characters, so it's still 260. Windows uses 16-bit fixed-width wide characters. Also, widest UTF-8 characters are actually 4 bytes long (longer ones are deprecated).

POSIX doesn't support anything like "magic long paths", because there is no need for that. Why not to use these "magic long paths" on Windows under the hood?

I was talking about Windows interpretation of the POSIX layer which I used in os.fs. I'm not sure it supports \\?\ paths.

I meant changing the os.fs backend in a way, that we gain support for longer-than-260-characters path lengths on Windows.

It would require some changes, possibly extensive.

Or at least we should provide means to check whether particular string can be used as path with the guarantee, that it will work seamlessly on all Dao-supported platforms.

Well, keeping paths below 260 bytes threshold should work.

dumblob commented 9 years ago

Well, keeping paths below 260 bytes threshold should work.

The reason why I brought this up is, that 260 bytes is far from being enough (not even for me - up until now, I didn't have any issues, because I didn't use Windows and thus had much higher limits). According to some surveys I saw about 50% of developers developing in multiplatform languages (i.e. not Apple Script nor C#, but anything else) use POSIX platforms. This percentage grew in the last years (5 or so) significantly (it used to be about 25% if I remember correctly). According to the discussion I linked above, it's a serious issue as the file hierarchy grows (depens) and it might be one of the reasons why the developers somehow slowly move to POSIX platforms.

Night-walker commented 9 years ago

I don't think that \\?\ path are widely used anyway, so keeping paths below 260 bytes on Windows is likely the only way to ensure that all the tools will work correctly.

dumblob commented 9 years ago

Sure, but this recommendation doesn't (and shouldn't) impose any firm restriction. From the implementation's point of view, we can in theory seamlessly switch from \\?\ paths to 260-long paths back and forth under the hood. I'll try to dig a little bit deeper into mingw and their experience with long paths.

Regarding other applications, basically Microsoft applications all work with \\?\ paths as well. Regarding third-party applications it wildly varies, but the better ones (hopefully correlates with more widely used ones) do support it as well (especially if they're written in .NET, i.e. use the new high-level APIs from Microsoft).

Night-walker commented 9 years ago

Sure, but this recommendation doesn't (and shouldn't) impose any firm restriction. From the implementation's point of view, we can in theory seamlessly switch from \?\ paths to 260-long paths back and forth under the hood. I'll try to dig a little bit deeper into mingw and their experience with long paths.

Actually, long paths still won't work for FAT32 and possibly other file systems, so if you want the safest limit, it's again 260.

dumblob commented 9 years ago

That's why I wrote, that under the hood, we can use \\?\ only for longer paths, not all paths.

Night-walker commented 9 years ago

It is still much simpler to just avoid long paths or associate them with drive letters. Especially taking into account that even Windows Explorer seem to have problems with them, not to mention lots of third-party applications and utilities. Supporting \\?\ is tedious, its utility is questionable, so I am reluctant to spend time on it.

dumblob commented 9 years ago

Well, let's add at least the safe value of 260 (bytes) to the os.fs namespace as some constant with a note, that for portability, the path length must be always manually checked and that long names are not supported on Windows for now. And you can also link this issue in the source code as this'll become a real issue once somebody will try to wrap/extend/mimic some bigger piece of SW (possible known from POSIX/UNIX world) working with files.

Night-walker commented 9 years ago

I added a note in the docs. Didn't add constant because 260 bytes is not a system-imposed limit but just the worst-case guess.

dumblob commented 9 years ago

I would still advocate for addition of a constant, because

  1. the number might change in the future (some service pack or whatever)
  2. it's not a guess, but rather a valid worst-case from all our platforms (considering default values on those systems)
Night-walker commented 9 years ago

the number might change in the future (some service pack or whatever)

It won't. First, it is bound to FAT32 path limit. Second, such change could not possibly be introduced on all OS releases including unsupported ones. So, even if it changes at some point, all previous releases will likely not be affected, so the constant would only do harm.

it's not a guess, but rather a valid worst-case from all our platforms (considering default values on those systems)

260 bytes is not an OS-imposed limit, so it has no place as os module constant.

dumblob commented 9 years ago

It won't. First, it is bound to FAT32 path limit. Second, such change could not possibly be introduced on all OS releases including unsupported ones. So, even if it changes at some point, all previous releases will likely not be affected, so the constant would only do harm.

Actually, FAT32 is nearly abandoned by everyone (including big companies and clusters). exFAT as the "new" successor is and will be developed and maintained. exFAT is though not backwards compatible with any of FAT file systems and it's highly probable, that there'll be some extension which'll support longer file names. So I'm sure this safe number of bytes in a path will change (grow sooner or later).

260 bytes is not an OS-imposed limit, so it has no place as os module constant.

Hm, I think I understand the logic now. This though means, that we really need to introduce a separate chapter in the dao-modules (as standard library) documentation with mentions about platform compatibility (including specific limitations and recommendations how to use certain modules and features).

Night-walker commented 9 years ago

Actually, FAT32 is nearly abandoned by everyone (including big companies and clusters).

I've seen it many times in various places. It doesn't matter, however.

So I'm sure this safe number of bytes in a path will change (grow sooner or later).

It almost certainly won't, because of the same reason Windows still keeps some things compatible to DOS. But even if it will, the change cannot affect all OS releases, so the constant bound to MAX_PATH on one release would be incorrect on another.

Hm, I think I understand the logic now. This though means, that we really need to introduce a separate chapter in the dao-modules (as standard library) documentation with mentions about platform compatibility (including specific limitations and recommendations how to use certain modules and features).

I don't think that would be of much use. Normally, one would rather read the relevant docs or google the platform path limitations then search for some constant somewhere in the library.

dumblob commented 9 years ago

Well, what do you @daokoder think about this issue of utterly ensuring compatibility across Dao-supported platforms?

daokoder commented 9 years ago

Well, what do you @daokoder think about this issue of utterly ensuring compatibility across Dao-supported platforms?

I haven't followed closely with your discussion. So I do really get your point, could you summarize it here?

dumblob commented 9 years ago

The point is to have a way how to ensure, that the path I'm working with will work without any issues on all Dao-supported platforms. I.e. ensure, that the length of a particular path is not too long on any of the supported platforms.

And now the issue is, that Windows support only 260 characters long paths on FAT16, FAT32 and exFAT and that older native Windows applications don't have support for longer names using the special prefixes \\.\ \\?\ (in case of NTFS since Windows 2000 the path length of 32768 characters is supported). 260 is though not sufficient (as discussed under an MSDN article) for a growing number of use cases and many developers complain about it very loudly (the depth/length of paths still grows).

Dao should support both - the use of maximum path length available on the particular system, but also offer the programmer means to guarantee, that the created program will run the same way on all supported platforms. These means can be proactive (allow only safe path lengths), reactive (raise an exception on a system, where the requested path is over the limit - e.g. more than 260 characters long), somewhere between (check the length on Windows and if it's longer than 260, use the long names prefix \\?\ and raise an exception if the system configuration doesn't support these long names) or something orthogonal to these two (passively provide the programmer with a constant holding the highest safe value allowing manual checking in runtime, or passively providing this value just in a documentation, or simply ignoring this issue completely) or some combination of the mentioned.

I myself would welcome the if-longer-than-260 solution, but it's quite tricky to implement it, because of the mess in the Windows API. Also @Night-walker had a good point, that we shouldn't put the 260 constant into os, because it has nothing to do with it.

Currently 260 is mentioned just in the documentation as recommended safe value for Windows.

Hope I summarized it and didn't forget any significant points.

daokoder commented 9 years ago

Also @Night-walker had a good point, that we shouldn't put the 260 constant into os, because it has nothing to do with it.

I think @Night-walker is right about that, we should not allow the limitation of Windows to affect other systems. So we shouldn't provide any such constant.

dumblob commented 9 years ago

Ok, let's close this then. We can discuss it again once some hardly-solvable issue occurs.