GemTalk / FileSystemGs

Port of FileSystem implementation to GemStone/S from Pharo
Other
0 stars 1 forks source link

Proposed FileMode class structure to replace #write and #read psuedo-modes #14

Closed regkrock closed 3 years ago

regkrock commented 5 years ago

Added the proof of concept of FileMode with subclasses to be passed instead of #write and #read. The classes which have been introduced are:

            FileAppendMode
                FileAppendAndReadMode
                    FileAppendAndReadBinaryMode
                FileAppendBinaryMode
            FileReadMode
                FileReadBinaryMode
            FileReadWriteMode
                FileReadWriteMode
                    FileReadWriteBinaryMode
                    FileReadWriteTruncatedMode
                        FileReadWriteTruncatedBinaryMode
            FileWriteMode
                FileWriteBinaryMode

These classes return the appropriate mode string, (ex: ‘w+’) and know if the file is writable. Example of change - the text in blue shows the change.

Existing format for opening a stream over a file:

        (filesystem open: aFileReference path writable: #write) ```

Proposed format for opening a stream over a file:
    (filesystem open: aFileReference path writable: self store readWriteTruncatedMode) ```

The #readWriteTruncatedMode method would return an instance of FileReadWriteTruncatedMode

If a form of this proposal were to be adopted, then the appropriate methods should be renamed to have mode. As an example, #open:writeable: would become #open:mode:

martinmcclure commented 5 years ago

After Dale's comments on making sure that this will work for other platforms, I looked a bit at Windows file API. The number of possible options for file opening and creating in Windows is quite extensive, and rather different than those in Unix. Trying to put these into a hierarchy of all possible legal combinations would be large and inflexible. The available options change slowly, but they do change.

So I think it would be better to have #open:options: (and FileReference>>openWithOptions:) where the options argument is an object that composes options from the available options. Each option (and we probably have very few options until we replace GsFile with something more flexible) could be a class that is an immediate subclass of FileOpenOption or something like that. Regards, -Martin

regkrock commented 5 years ago

When I created the hierarchy I had some concerns.

Do you have a reference for the Windows combinations.

Also, should we be including the gzip options as well or is that not being handled via the FileSystem?

Regards,

Reg

martinmcclure commented 5 years ago

If you're curious, the Windows documentation for file opening is https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-createfilew

For Linux, which should be very similar to options for other Unix systems, including MacOS: https://linux.die.net/man/2/open

But I don't think we need to implement any of these right now. Few (none?) are currently supported by GsFile, so we will need to wait until we have a more capable lower layer to build on. What I'd like to see now is a framework that is flexible enough to add file opening attributes later.

Compression such as gzip (and we want to add LZ4 compression before too long, but not in GsFile) should not be an attribute of an open file. Compression is an attribute of what streams you stack on top of the file. All files are binary, but you can interpret them with an encoding stream. All files are uncompressed, but you can read or write them with a compressing stream.

EDIT: It has been pointed out to me that the options O_RDONLY, O_WRONLY, or O_RDWR exist in Unix. So the following statement is false. (Fake news: In Unix, all files are opened read-write) -- (This is true, however -- notice that in Linux Pharo you can open a file with openWritable: false and still write to the file). Regards, -Martin

regkrock commented 5 years ago

Based on the comments above and looking at the links this is an attempt to capture the essence of the longer term needs and current state of affairs I will put forward some ideas:

  1. Create a class which holds more information than a FileMode, lets call it FileSpec

    • open mode - this contains basic information about opening the file
    • file type - Text or Binary file type
    • compression policy - No Compression or GZip compression
    • create file policy - Indicates what to do if the file does not exist
    • encoding to used - default is utf-8
  2. Mode - knows the following:

    • initial write position
    • initial read position
    • truncate - if the file is to be truncated
  3. Knowledge

    • knows if it is readable and/or writable
    • can return the appropriate string to be used with opening a file. Both normal and compressed files
  4. Classes and related hierarchies FileSpec

FileMode +-- FileAppendMode +--FileReadMode +--FiileReadWriteMode +--FileWriteMode

FileType +--TextFileType +--BinaryFileType

CompressionType +--NoCompression +--GZip

  1. Additional ideas
    • Be able to specify the stream types. This would allow for future flexibility.

Regards,

Reg

martinmcclure commented 5 years ago

Hi Reg,

I don't think it would be the best design to put all that knowledge in one object. A more flexible design (and one more like other Smalltalks are moving toward) is to have objects that the user assembles together to provide the desired functionality. Each object has a narrow responsibility. In the case of file usage, there's the object that represents an open file (FileHandle) and a stack of streams on top of that, each stream providing a specific value add such as buffering, encoding, or compression.

Dale and I talked to Allen yesterday, and it looks like we may actually have a replacement for GsFile for GemStone 3.5! If not that soon, it seems likely to be pretty soon. This API will have a much more modern design, and be much simpler and low-level. Given that, we don't want to use GsFile's design as a model for the design of GsFileSystem.

After doing a bunch of reading of code and of the Pharo mailing list, I think there should be two paths for opening a file.

1) The default path is to send #readStream, #writeStream, #binaryReadStream, or #binaryWriteStream to a FileLocator or FileReference. This opens the file with default attributes and gives you the requested type of stream.

2) If you want to open the file with non-default options, send a FileLocator or FileReference #openWithAttributes:, where the argument is an object that composes some number of the possible attributes. The answer is a FileHandle. You can then get a stream from a FileHandle with the #readStream, #writeStream, #binaryReadStream, or #binaryWriteStream messages.

The #readStream and #writeStream messages should use default buffering, and platform-dependent default encoding -- UTF-8 for Unix, UTF-16 for Windows. This is also the encoding for filenames on those platforms.

So a FileHandle represents a file opened with some set of attributes. Things that should be attributes of a FileHandle:

Other things, such as Unix O_APPEND and O_TRUNC file status flags, can be added later, once we have replaced GsFile. In the meantime, these are easily simulated in common situations with file actions after opening.


Things that are not part of opening a file, and therefore should not be part of a FileHandle's options or state:

Text vs Binary: This is dependent on what streams you use to read or write the file, not on the FileHandle itself. Also, in the modern world there is no one "text" mode, in either Unix or Windows. There are various possible encodings, which are implemented with encoding/decoding streams.

Compression: This is done by compressing/decompressing streams, not by the file itself. There are also an open-ended number of ways to compress a file (gzip, lz4, lzo...).

For the file position, the FileHandle does not need to keep track of this, since the underlying operating system file knows its position. There do need to be messages that can be sent to a FileHandle to retrieve the position, and to set it. Normally, these messages would only be sent by the stream that is interacting directly with the FileHandle.

With this kind of design, the classes needed would be:

FileHandleAttributes (a composition of FileAttribute classes)

FileAttribute (a single attribute) FileRead FileWrite (and more FileAttribute classes later, but those will do for now)

And whatever Stream classes are needed. For now, the basic Zn streams will do. We'll be adding Xtreams support, which will have a full set of buffering, encoding, compressing, and encrypting streams.


I hope that all this is more clarifying than confusing. I suspect I haven't yet made my thoughts entirely clear. Please continue the discussion as necessary.

Regards, -Martin

regkrock commented 5 years ago

Hi Martin, The following is based on what I garnered from other Smalltalks and the sites you referenced. Let me know what you think.

The Default (Linux) column shows the default numbers for each option. The Windows columns shows the windows value.

Class                                               Posix Name      Default (Linux)       Windows 
========================                           ============    ================       =======
FileOption                                                                 
    FilePositioning                                                        
        FileSeekCurrentPosition              SEEK_CUR                     1
        FileSeekSetPosition                  SEEK_SET                     0
        FileSeekEndPosition                  SEEK_END                     2
    FileLockOption                                                         
        FileWriteLockOption                  F_WRLCK                      2
        FileReadLockOption                   F_RDLCK                      1
        FileMandatoryLockOption              F_MDLCK                      0
    FileOpenMode                                                           
        FileOpenReadWriteMode                O_RDWR                       2
        FileOpenReadOnlyMode                 O_RDONLY                     0
        FileOpenWriteOnlyMode                O_WRONLY                     1
    FileCreationOption                                                     
        FileExclusiveOption                  O_EXCL                     128         262144
        FileDirectoryOption                  O_DIRECTORY      not supported
        FileDirectOption                     O_DIRECT         not supported
        FileDoNotUpdateAccessTimeOption      O_NOATIME        not supported
        FileCloseOnExecOption                O_CLOEXEC        not supported
        FileKeepSymbolicLinksOption          O_NOFOLLOW       not supported
        FileCreateOption                     O_CREAT                     64         131072
        FileNotControllingTerminalOption     O_NOCTTY         not supported
        FileTruncateOption                   O_TRUNC                    512         524288
    FileStatusOption                                                       
        FileAsyncOption                      O_ASYNC          not supported
        FileRsyncOption                      O_RSYNC          not supported
        FileDsyncOption                      O_DSYNC          not supported
        FileNonBlockOption                   O_NONBLOCK                 128
        FileSynchronousOption                O_SYNC           not supported
        FileAppendOption                     O_APPEND                  1024          65536
    FileShareOption                                                        
        FileDenyReadWriteOption              O_DENYRDWR                   2             16
        FileDenyNoneOption                   O_DENYNONE                   0             64
        FileDenyWriteOption                  O_DENYWR                     3             32
        FileDenyReadOption                   O_DENYRD                     1             48
    FileControlOption                                                      
        FileCloseOnExecutionOption           F_DCLOEXEC                   1
    FileTypeOption                                                         
        FileGzipTypeOption                   O_GZIP                       0
        FileTextTypeOption                   O_TEXT                       0
        FileBinaryTypeOption                 O_BINARY                     0              1

I have also created a FileOptions class to manage the various file options. It has the following instance variables:

The FileOption class has also had a number of instance creation convenience methods added:

append, #read, #write, #readWrite, #appendAndRead, #readBinary, #writeWithHighCompression, #writeWithLowCompression, etc.

In GsFile, the following method has been implemented to support opening a file from FileSystemGs.

open: aPathName withOptions: aFileOptions
    "This is the current way of openning a file"
    ^self open: aPathName mode: aFileOptions modeString onClient: aFileOptions isClientFileSystem withCompression: aFileOptions isGzipped.
regkrock commented 5 years ago

Using FileOptions to open a file in Windows and Unix

Below I describe what I have implemented to support for opening files. The 'FileHande' classes described below are not implemented. They are just psuedo-code to highlight how the FIleOptions class works.

1) Windows For windows compatibility withe the Posix file options, the following methods have been added to WIndowsStore on the class side.

Access Rule Value
readOnlyAccessRule 2147483648
readWriteAccessRule self readOnlyAccessRule bitOr: self writeOnlyAccessRule
writeAccessRule 1073741824
Create Rule
createNewRule 1
openAlwaysRule 4
openExistingRule 3
truncateExistingRule 5
Share Rule
denyNoneShareRule 3
denyReadShareRule 2
denyReadWriteShareRule 0
denyWriteShareRule 1

So the low level call for opening a file in Windows (32 bit) would be similar to:

....
(securityAttributes := CfsOSSecurityAttributes new)
        lpSecurityDescriptor: nil;
        bInheritHandle: true;
        nLength: CfsOSSecurityAttributes fixedSize.
....
result := self
    platformFunctionCreateFile: filename
    fdwAccess: aFileOptions accessRule
    fdwShareMode: aFileOptions shareRule
    lpsa: securityAttributes
    fdwCreate: aFileOptions createRule
    fdwAttrsAndFlags: WindowsStore fileAttributeNormal "This is 128"
    hTemplateFile: nil).
result asInteger = Windows invalidHandleValue.
    ifFalse: [
        aFileOptions hasTruncateOption
        ifTrue: [
            result := self
                platformFunctionCreateFile: filename
                fdwAccess: aFileOptions accessRule
                fdwShareMode: aFileOptions shareRule
                lpsa: nil
                fdwCreate: aFileOptions truncateExistingRule
                fdwAttrsAndFlags: WindowsStore fileAttributeNormal "This is 128"
                hTemplateFile: nil].

WindowsFileHandle>>platformFunctionCreateFile: truncFileName fdwAccess: accessRule fdwShareMode: shareRule lpsa: lpsa fdwCreate: createRule fdwAttrsAndFlags: fileAttribute hTemplateFile: hTemplateFile

    <c: pointer 'kernel32.dll':CreateFileA pointer uint32 uint32 pointer uint32 uint32 pointer>

    ^self primitiveFailed

2) Unix/Linux Open The Unix/Linux option is very straight forward as the code below shows. Note: Share is not used in Linux based on the code I was looking at.

...
self 
    platformFunctionOpen64: path nullTerminated 
    oflag: aFileOptions  fileOpenAndOptionsValue
    mode: aFileOptions accessPermissions

UnixFileHandle>>platformFunctionOpen64: path oflag: oflag mode: mode

    <c: int32 Open64 safepointer int32 uint32>
    ^self primitiveFailed
regkrock commented 5 years ago

The following are the Posix errors that I defined:

I am not certain if the WindowsValue makes sense or not.

                                             PosixName              UnixValue       WindowsValue
FilePosixError                                                             
    FileSynchronizedIONotSupportedError      EINTR            not supported
    FileBadFileDescriptorError               EBADF                       10              9
    FileNoEntryError                         ENOENT                       8              2
    FileBusyError                            EBUSY                       12             16
    FilePermissionDeniedError                EPERM                        7              1
    FileIOError                              EIO                          9              5
    FileImproperLinkError                                                14             18
    FileNoSpaceError                         ENOSPC                      18             28
    FileInvalidOptionError                   EINVAL                      16             22
    FileMaxFilesOpenError                    ENFILE                      17             23
    FileIsDirectoryError                     EISDIR                      15             21
    FileReadOnlyFileSystemError              EROFS                       19             30
    FileTooManySymbolicLinksError            ELOOP            not supported
    FileNameToLongError                      ENAMETOOLONG     not supported
    FileAccessError                          EACCES                      11             13
    FileExistsError                          EEXIST                      13             17
    FileDeviceNotSameError                   EXDEV                       14             18
martinmcclure commented 5 years ago

On 1/7/19 7:49 PM, Reg Krock wrote:

Hi Martin, The following is based on what I garnered from other Smalltalks and the sites you referenced. Let me know what you think.

Thanks, Reg.

I have some questions and comments.

It's not clear to me exactly how you intend to use these classes. You mention instances of these classes -- what state do they have? Is one instance of, say, FileCreateOption different from another instance of FileCreateOption? If not, it might be better to use the class as a named singleton and not instantiate these things.

When you say "not supported" in the Linux column, does that mean that you are proposing that we not support these? Some of them seem useful and worth supporting eventually.

Other detailed comments interspersed below. I'm primarily looking at Linux, not MacOS (similar but not identical), or Windows (very different) in these comments.

Regards,

-Martin

The Default (Linux) column shows the default numbers for each option. The Windows columns shows the windows value.

|Class Posix Name Default (Linux) Windows ======================== ============ ================ ======= FileOption FilePositioning FileSeekCurrentPosition SEEK_CUR 1 FileSeekSetPosition SEEK_SET 0 FileSeekEndPosition SEEK_END 2 | These FilePositioning options are not attributes of a file, but of an individual seek operation. Therefore, they should not be included in the FileOption. Probably better to not have the classes at all, but to offer them as options in whatever #seek API we end up with. |FileLockOption FileWriteLockOption F_WRLCK 2 FileReadLockOption F_RDLCK 1 FileMandatoryLockOption F_MDLCK 0 |

|File locking is similar -- it's not an attribute of a file, but an operation one can do on a file through fcntl or flock. Also, I can't find F_MDLCK anywhere except as a proposed (but apparently never implemented) option in Cygwin. There are also other options (F_UNLCK, LOCK_SH, LOCK_EX, LOCK_UN) that would be implemented by options to a locking API. We might want a "lock options" object as an argument for a locking API, but I'd want to see a design for that API to be sure.|

||

|FileOpenMode FileOpenReadWriteMode O_RDWR 2 FileOpenReadOnlyMode O_RDONLY 0 FileOpenWriteOnlyMode O_WRONLY 1|

|For these, I'd rather see separate composable "readable" and "writeable" attributes, since the combination of "neither readable nor writeable" is actually useful in Linux.|

|Also, please avoid use of "mode" in names. These names communicate well even without "mode," and "mode" has some a specific meanings in UNIX. A mode of read-only means that the file's permissions are read-only, not that the file is opened read-only. So it can be confusing to use that word in this context.|

FileCreationOption FileExclusiveOption O_EXCL 128 262144 FileDirectoryOption O_DIRECTORY not supported FileDirectOption O_DIRECT not supported FileDoNotUpdateAccessTimeOption O_NOATIME not supported O_DIRECT and O_NOATIME should be FileStatusOptions, not FileCreationOptions. Linux open(2) defines O_CLOEXEC as a file creation option, but I'm not sure why, since it appears to be fully functional on extant files. FileCloseOnExecOption O_CLOEXEC not supported FileKeepSymbolicLinksOption O_NOFOLLOW not supported FileCreateOption O_CREAT 64 131072 FileNotControllingTerminalOption O_NOCTTY not supported FileTruncateOption O_TRUNC 512 524288 There is also O_TMPFILE to create temporary files, but that one is so special-purpose that it seems like a separate API for temp file creation is justified, rather than putting it in the FileOptions. FileStatusOption FileAsyncOption O_ASYNC not supported FileRsyncOption O_RSYNC not supported O_RSYNC is not supported in Linux. Might be in MacOS, though. FileDsyncOption O_DSYNC not supported FileNonBlockOption O_NONBLOCK 128 FileSynchronousOption O_SYNC not supported FileAppendOption O_APPEND 1024 65536 There's also O_PATH, which might be a good thing to include, or at least think about including. FileShareOption FileDenyReadWriteOption O_DENYRDWR 2 16 FileDenyNoneOption O_DENYNONE 0 64 FileDenyWriteOption O_DENYWR 3 32 FileDenyReadOption O_DENYRD 1 48 Are the FileShareOptions Windows-specific? If so, I'd leave them out for now, though we'd want the design of the API to allow their addition for platforms that support Windows. FileControlOption FileCloseOnExecutionOption F_DCLOEXEC 1 I think you mean FD_CLOEXEC? Do you have an API in mind for doing file control operations? That could use this option, plus the ones that can be specified when a file is opened and then changed after the file is open. FileTypeOption FileGzipTypeOption O_GZIP 0 FileTextTypeOption O_TEXT 0 FileBinaryTypeOption O_BINARY 0 1 The FileTypeOptions are not attributes of a file in either UNIX or Windows, and should not be included in any form.

I have also created a FileOptions class to manage the various file options. It has the following instance variables:

  • mode - this contains an instance of a FileOpenMode subclass.
  • options - this contains supported instance(s) of the subclass of FileCreationOption and/or an instance of FileTruncateOption.
  • fileType - this indicates what the file type is. The Gzip type has been added here for compatibility with the current compression implmenetation.

We do not want compatibility with the current implementation. GsFile will continue to exist as is for existing users. FileSystemGs will exist alongside it, but no compatibility between the two is needed. Compression as a property of a file is one of the things that is wrong with the GsFile design, and should not be part of FileSystemGs.

  • share - this indicates how the file is shared
  • permissionOptions - these have not yet been implemented yet.

I think this might be more instance variables than needed. fileType can go away. mode can be folded into options. I'm not sure why share and permissionOptions aren't just more options either.

The FileOption class has also had a number of instance creation convenience methods added:

append, #read, #write, #readWrite, #appendAndRead, #readBinary,

writeWithHighCompression, #writeWithLowCompression, etc.

I'd rather not have any convenience messages, at least not yet. I expect this API to only be used directly by power users, and the FileSystemGs code itself. Perhaps two convenience methods, corresponding to what would be used by FileReference>>readStream and FileReference>>writeStream. We can always add more later if any other combinations end up being very frequently used, but let's try to keep the API lean.

In GsFile, the following method has been implemented to support opening a file from FileSystemGs.

|open: aPathName withOptions: aFileOptions "This is the current way of openning a file" ^self open: aPathName mode: aFileOptions modeString onClient: aFileOptions isClientFileSystem withCompression: aFileOptions isGzipped.|

|I'd rather not change GsFile in any way. It's the legacy world. For right now we have to build on top of it, but it won't be long before the two file worlds are completely separate, so using aFileOptions with GsFile seems to muddy the waters too much.|