dokan-dev / dokany

User mode file system library for windows with FUSE Wrapper
http://dokan-dev.github.io
5.27k stars 665 forks source link

Python bindings to dokany. #964

Closed kdschlosser closed 2 years ago

kdschlosser commented 3 years ago

OK so I have done some reading and it seems like there was once a Python binding using Pyfilesystem which is now no longer maintained. Pyfilesystem2 it the new version and I was unable to locate any information regarding having support for dokany. It appears as tho it has not been done.

I have started writing a binding to dokany and I am not going to use any other python library other then ctypes (hopefully). I have ported the entire API and it imports without issue. I have to create a pythonic API to make it's use easier. I am looking for some suggestions and possibly some help with this aspect of it. I was thinking of using an API that is similiar to Flask and make use of decorators to register callbacks to access paths in the virtual file system.

I have attached the ported API and if the developers of dokany have an interest in possibly adding it when I am finished we can discuss getting that done.

any ideas or suggestions are greatly appreciated. and let me know if there is an interest in adding a python binding to dokany. dokan.zip

Liryna commented 3 years ago

Hi!

Indeed, since pyfilsystem2 we no longer have a proper python API. If you are willing to maintain or at least create and setup the repository and first version that would be really great. I can give you access to a sub repository of dokan-dev when you feel ready.

Such contributions are highly valuable so I already thank you for sharing your first draft.

Note: I am not a python developer so I will not able to offer guidance on the design.

kdschlosser commented 3 years ago

Hey, no worries. I am glad to share it with you. I will probably have some questions to ask regarding the use of the API and I am sure you will be able to help out with that aspect of it. It seems like for the most part it is pretty easy to use from what I have seen.

I do have one question. In you documentation (https://dokan-dev.github.io/dokany-doc/html/files.html) the list of files do not match what was installed with the latest version of dokany. The 3 files in the attached zip file are the only ones that were installed other then the fuse API.

I wrote a program that reads C/CPP h files and converts them to Python. It's not perfect and it gets about 95% of the code converted. I have to do some cleanup after. It's a HUGE time saver. I have converted over a million lines of the Windows SDK to Python.

There are also these functions that are exported in the DLL but are not in the API.

kdschlosser commented 3 years ago

I have gotten DOKAN_OPTIONS all coded up as well as started on the handler class for a mount. I have a bunch of questions but I will address them one at a time.

what exactly are these fields in the DOKAN_CONTROL structure, what populates them as well.

I am having some troubles grasping the purpose to there being these 2 functions. I can understand having one of them but it looks like they ultimately do the same thing. could you please explain the difference?

kdschlosser commented 3 years ago

I wanted to shoot you a progress report. I have made classes to handle the bitwise flags used in the file attributes, and the file system attributes, security flags. pretty much anything that uses bitwise operations.

I am going to be using decorator functions that will allow both static and dynamic registration of callbacks for the different fields in DOKAN_OPERATIONS

Here is some sample code of how it is going to function.

import dokan
import esp32

options = dokan.Options()

options.is_removable_drive = True
options.is_case_sensitive = False
options.enable_notification_api = True
options.mount_point = 'Z:'
options.mount_drive_for_current_session = True
options.enable_fcb_garbage_collection = True
options.sector_size = 512
options.allocation_unit_size = 4096
options.thread_count = 1

drive = dokan.Drive(options)

open_files = {}

# this is a blanket callback that covers all files except the "special use cases"
@drive.default_create_file
def create_file_callback(request):
    disposition = request.create_disposition
    if disposition.create_always:
        if esp32.exists(request.filename):
            esp32.write_file(request.filename, bytes(''))
            return dokan.ERROR_ALREADY_EXISTS
        else:
            esp32.make_file(
                request.filename,
                request.attributes.hidden,
                request.attributes.read_only,
                request.attributes.archive,
            )
            return dokan.STATUS_OBJECT_NAME_COLLISION

    elif disposition.create_new:
        if esp32.exists(request.filename):
            return dokan.ERROR_FILE_EXISTS
        else:
            esp32.make_file(
                request.filename,
                request.attributes.hidden,
                request.attributes.read_only,
                request.attributes.archive,
            )
            return dokan.STATUS_SUCCESS

    elif disposition.open_always:
        if esp32.exists(request.filename):
            file = esp32.open_file(request.filename)
            open_files[request.filename] = file
            return dokan.ERROR_ALREADY_EXISTS
        else:
            esp32.make_file(request.filename)
            file = esp32.open_file(request.filename)
            open_files[request.filename] = file
            return dokan.STATUS_OBJECT_NAME_COLLISION

    elif disposition.open_existing:
        if request.is_directory:
            if esp32.exists(request.filename):
                if esp32.isdir(request.filename):
                    return dokan.STATUS_SUCCESS
                else:
                    return dokan.STATUS_NOT_A_DIRECTORY
            else:
                return dokan.ERROR_FILE_NOT_FOUND

        else:
            if esp32.exists(request.filename):
                file = esp32.open_file(request.filename)
                open_files[request.filename] = file
                return dokan.STATUS_SUCCESS
            else:
                return dokan.ERROR_FILE_NOT_FOUND

    elif disposition.truncate_existing:
        if esp32.exists(request.filename):
            if request.permissions.generic_write:
                esp32.write_file(request.filename, bytes(''))
                file = esp32.open_file(request.filename)
                open_files[request.filename] = file
                return dokan.STATUS_SUCCESS
            else:
                return dokan.STATUS_ACCESS_DENIED

        else:
            return dokan.ERROR_FILE_NOT_FOUND

for path, is_directory in esp32.tree:

    if 'example_file.txt' in path:
        # we can provide callbacks for special use cases.
        @drive.create_file(path)
        def create_file_callback(request):
            if esp32.exists(request.filename):
                esp32.write_file(request.filename, bytes(''))
                return dokan.ERROR_ALREADY_EXISTS
            else:
                esp32.make_file(
                    request.filename,
                    request.attributes.hidden,
                    request.attributes.read_only,
                    request.attributes.archive,
                )
                return dokan.STATUS_OBJECT_NAME_COLLISION

drive.mount()

while drive.is_mounted:
    while esp32.renamed_files:
        old_path, new_path, is_dir = esp32.renamed_files.pop(0)
        drive.rename(old_path, new_path, is_dir, True)

    while esp32.created_files:
        path, is_dir = esp32.created_files.pop(0)
        drive.create(path, is_dir)

    while esp32.deleted_files:
        path, is_dir = esp32.deleted_files.pop(0)
        drive.delete(path, is_dir)
kdschlosser commented 3 years ago

in my example above I am returning ERROR_ALREADY_EXISTS if the file exists and error STATUS_OBJECT_NAME_COLLISION if the file doesn't exist because according to the Windows API docs it states to return ERROR_ALREADY_EXISTS if the file exists and STATUS_SUCCESS (0) if the file doesn't, In the API docs for dokan it states to return STATUS_OBJECT_NAME_COLLISION instead of STATUS_SUCCESS when the create disposition is CREATE_ALWAYS so that is what I have done, it doesn't seem correct tho.

kdschlosser commented 3 years ago

I have a crazy question...

It looks like the dokan driver is written to allow multiple attachments or mounts. but it looks like this only on part of the code and not all of it. How do the DokanNotify* functions know what mount point to reference?

kdschlosser commented 3 years ago

ok so here is an update.

I have made some good progress and I am almost to the point of testing this. I still have to implement the find callbacks and write a setup program.

I still have to code a setup script to install the library and to write the script to handle compiling to documentation. There is still some documentation I have to do but the majority of it is done.

I created a repository for the binding. https://github.com/kdschlosser/py_dokany

at the moment the library chimes in at 7639 lines including commented lines. actual code lines is 3862 there is a large chunk of the commented lines that are actually documentation strings but my statistic generator has not been written to handle counting those lines as code.

kdschlosser commented 3 years ago

OK I am testing this thing now. It does make the drive like it should. I have it properly collecting the size of the "drive", There is some really strange behavior with how the CreateFile callback is working. It is not following what the Windows API states.

An application cannot create a directory by using CreateFile, therefore only the OPEN_EXISTING value is valid for dwCreationDisposition for this use case. To create a directory, the application must call CreateDirectory or CreateDirectoryEx.

To open a directory using CreateFile, specify the FILE_FLAG_BACKUP_SEMANTICS flag as part of dwFlagsAndAttributes.

what is happening is the directory of "\" is trying to get opened but the flag that is set is CREATE_ALWAYS. This should not be taking place. There is also nothing in the API Documentation for dokany that explains the returned values, what they should be and how dokany is going to respond to them. It's extremely vague when it states to return NTSTATUS codes but in the API documentation it states to return Error codes which are not NTSTATUS codes.

When dokany makes the callback to ZwCreateFile and I react according to the Windows API documentation since this is what is stated as a reference in dokany's docs I return what the Windows API docs say and then dokany attempts to open a file called "\" which is obviously not a file.

There is also another issue and that is with the GetFileInformation callback. the fields in BY_HANDLE_FILE_INFORMATION are incorrect, as are the fields in FILETIME

typedef struct _BY_HANDLE_FILE_INFORMATION {
  DWORD    dwFileAttributes;
  FILETIME ftCreationTime;
  FILETIME ftLastAccessTime;
  FILETIME ftLastWriteTime;
  DWORD    dwVolumeSerialNumber;
  DWORD    nFileSizeHigh;
  DWORD    nFileSizeLow;
  DWORD    nNumberOfLinks;
  DWORD    nFileIndexHigh;
  DWORD    nFileIndexLow;
} BY_HANDLE_FILE_INFORMATION, *PBY_HANDLE_FILE_INFORMATION, *LPBY_HANDLE_FILE_INFORMATION;

typedef struct _FILETIME {
  DWORD dwLowDateTime;
  DWORD dwHighDateTime;
} FILETIME, *PFILETIME, *LPFILETIME;

dokany has the DWORD changed out for some form of int instead of a ulong,

I am guessing the downloadable release of dokany is compiled using the .NET source, because in that source I see this

/// <summary>
        /// The file attributes. For possible values and their descriptions.
        /// </summary>
        public uint dwFileAttributes;

That is the wrong data type for dwAttributes

This is going to cause me a decent amount of grief because of trying to fill the pointer in the callback. I will have to figure out how to memcopy the entire structure onto the pointer. and also make changes to the structure so the data types properly match the ones that are being used in the structure when the callback is made.

Liryna commented 3 years ago

I will try not to miss any of your questions @kdschlosser

I do have one question. In you documentation (https://dokan-dev.github.io/dokany-doc/html/files.html) the list of files do not match what was installed with the latest version of dokany. The 3 files in the attached zip file are the only ones that were installed other then the fuse API.

Yes, this is normal. Only the files needed to build a FS / wrapper are installed. dokanc.h dokani.h list.h are internal to dokan build. They are just listed here by doxygen automatically.

There are also these functions that are exported in the DLL but are not in the API.

Those should not be used by a FS or wrapper. I added some doc here: https://github.com/dokan-dev/dokany/commit/1d3c263262c46bfc621fc4eaa33c05318aa9210a

what exactly are these fields in the DOKAN_CONTROL structure, what populates them as well.

This struct is used between the library and driver to exchange the current mount informations. The issue is that those fields mostly contains Kernel values. So outside the dev exactly know what he is doing with DokanGetMountPointList, there is not reason to be aware of this struct. The fields are set in here https://github.com/dokan-dev/dokany/blob/dcd44377c1fe571c24dbbb82982414ca43564f6a/sys/event.c#L566 during the mount.

I am having some troubles grasping the purpose to there being these 2 functions.

Legacy. We keep them to no break previous FS. DokanUnmount should be removed next major release.

In your sample, there is a mix of ERROR* & STATUS* . The API only expect to have NTSTATUS returned. Is that normal ? STATUS_OBJECT_NAME_COLLISION is the kernel translated value of ERROR_FILE_EXISTS / ERROR_ALREADY_EXISTS. Be aware of DokanNtStatusFromWin32. https://github.com/dokan-dev/dokany/blob/dcd44377c1fe571c24dbbb82982414ca43564f6a/dokan/ntstatus.i#L17-L18

It looks like the dokan driver is written to allow multiple attachments or mounts. but it looks like this only on part of the code and not all of it. How do the DokanNotify* functions know what mount point to reference?

DokanNotify* function expect to have the full path for the notification. The full path contains the mount point so the driver will be able to properly know how to handle it.

I created a repository for the binding. https://github.com/kdschlosser/py_dokany

This is really looking great and love that the documentation is kept!

About https://github.com/dokan-dev/dokany/issues/964#issuecomment-770199161

I have it properly collecting the size of the "drive"

FYI, the size can be anything so it is better to let the dev use the wrapper to set his own value.

The CreateFile of the dokan API is closer to ZwCreateFile than CreateFile. Therefore it will also be called to open & create directory. CreateDirectory is simply a helper that end up internally to ZwCreateFile. The calls that you do not find "logic" like open "\" as a file are coming from the system and we do not filter them.

There is also another issue and that is with the GetFileInformation callback. the fields in BY_HANDLE_FILE_INFORMATION are incorrect, as are the fields in FILETIME

We directly use the struct defined in Windows SDK. You seem to have found the implementation in the C# wrapper https://github.com/dokan-dev/dokan-dotnet/blob/b51cf4741695242afb437f19fcfd856873de13f4/DokanNet/Native/WIN32_FIND_DATA.cs which is not used by the native library.

kdschlosser commented 3 years ago

I just did a MASSIVE code update. I spent the last 2-3 days porting the Mirror example and testing it. It Works!!. It still needs to be fully exercised to check for bugs in the code, I am pretty good at this Python ctypes stuff and porting C/CPP code to Python, there may be a few issues kicking about but nothing that would be overly complex to fix.

I still have to key in some documentation and write a setup utility. in the mirror example I am not using my wrapper classes, I will change this so that it does. I wanted to test the raw functionality of the Dokany SDK port and driver binding to make sure that everything was good to go there. Now I can work on the "fluff" that will bring ease of use.

All of the Windows API bits of code used in the mirror example I had to create bindings to those as well. I placed them in their own module in the library so the user is not going to need to create them if they needs to use them, this includes all of the windows functions, constants, structures, unions, enumerations and clib functions that are used.

I pushed the code changes so you can test it if you want

to run the example

python example.py -h 

I did change the command line switch names to something that is easier to know what the switch does.

right now the library chimes in at a tad over 10,000 lines of code. with 1/4 of it being blank lines and the other 1/4 being comments and documentation.

I still also have to key in the 1241 different status codes and error codes with documentation. I will write a program that will scrape Microsofts knowledge base and build the file for me.

Liryna commented 3 years ago

Very good progress! Sounds promising.

Would it not be lighter to have Windows API bits of code used in the mirror example taking advantage of an existing python library ?

kdschlosser commented 3 years ago

I am the only person that I know of that makes an existing pure Python Windows API binding and it is massive. It is not even something that is supposed to be loaded, it is more for a copy and paste reference. My goal is to wrap as much of the Windows API specific components and to dumb it down a whole lot. make it easier to use. Having to deal with type conversions, casting, moving arrays into pointers and those kinds of things is not the easiest to do in Python. It is especially aggravating in Python because of how it internally handles some of the data types.

a great example is the GeetFreeDiskSpace callback pointer. The arguments that are passed to the callback are supposed to be these data types. PULONGLONG, PULONGLONG, PULONGLONG, PDOKAN_FILE_INFO

and what the python code does to them is this. int, int, int, PDOKAN_FILE_INFO

The structure DOKAN_FILE_INFO it doesn't know what to convert it to so it leaves it alone. so when I want to access the structure I would do this.

PDOKAN_FILE_INFO.contents

and then I am accessing the structure. any changes I make can be accessed from the C code side of things. when Python does what it thinks is doing a favor and removes that pointer wrapper and sees what is inside is a ULONG and does the conversion internally and passes an int I loose the ability to be able to modify the contents so the C code will be able to see that modification. It does the same thing to WCHAR arrays, USHORT, SHORT, LONG, ULONG, LONGLONG, ULONGLONG, DOUBLE.... you get the point.

so now what I am having to battle is how to get the data back to dokany. I can't memcopy because the C data type is not what is passed to the callback after python gets done mucking it up. I wish they would have just left it alone and didn't mess about with doing type conversions.

I would have to check the behavior of an older version of Python I do not remember it doing this and I believe it to be a more recent change they made.

kdschlosser commented 3 years ago

OK so I am making progress only to hit yet another wall.

Dokany keeps on calling the ZwCreateFile callback over and over and over again wanting to open the root folder. I have followed the Dokany API directions for the ZwCreateCallback which are extremely lacking and what information is there is not really clear on what needs to be done.

Using the DokanNtStatusFromWin32 function to turn Windows HRESULT codes into NTSTATUS codes is all fine and dandy if what is being written is using Windows functions to get the information needed to populate the drive with. But if I am getting the information from say a cloud file server that function is going to be all but useless without documentation on what HRESULT codes can be passed into it and what the behavior of Dokany is going to be once the returned NTSTATUS code is returned from the callback. I am also guessing that there is something special with the error codes as well and changing the behavior of Windows and/or Dokany. There are a whole lot of SetLastError function calls in the ZwCreateFile callback in the mirror example. There is also documentation about setting the error codes in the Windows API documentation for CreateFile the setting of the error code looks like it is always after the ReturnToSelf Windows function is called. So I am not sure what the deal is with that.

I am using the mirror example that has been ported to Python and again I can only get the root drive to populate, If I open a sub folder I get a CreateFile callback trying to open desktop.ini in the folder I am wanting to open but the file attributes have the flag FILE_ATTRIBUTE_DIRECTORY set nd the create options have the FILE_NON_DIRECTORY_FILE flag set. It keeps on trying to open this file as well. It keeps on attempting it with the exact same flags set, I think for some reason it's expecting the result to be different?? Not sure why the program would be written that way as it creates a massive amount of overhead that would cause a large problem is using something like a serial connection to a micro controller (which is my use case and the reason for me making this library). It seems to be doing things that really make no sense, like trying to open the root folder when what I am wanting to do is open a sub directory of the root.

I have attached some log files to show what is going on, maybe you can shed some light on it. A flow chart of how the process works would also be a really handy thing.

logs.zip

Liryna commented 3 years ago

@kdschlosser If you see a difference of behavior (flood of CreateFile) with the C mirror, it must be because the return code / information are not the expected one for the system. You should know that Dokan is only a proxy of what the system is doing. It does not filter or try to be smart, that the job to the FS.

By HRESULT you mean Win32 error code, right ? I believe specific points are explained in Dokan doc and otherwise it is the Microsoft documentation that needs to be used. The SetLastError are just here to not corrupt the error code when impersonation fails. The SetLastError from CreateFile is replaced by https://github.com/dokan-dev/dokany/blob/master/dokan/dokan.h#L237-L239

It might be better for you to start with https://github.com/Liryna/fstools on the mirror while not using explorer. Having each test enabled one by one to have the minimum of logs and see what is happening. Here I am really not able to read all those logs like that. I suggest to also use procmon which is more readable to see the activity.

kdschlosser commented 3 years ago

I am using the mirror example. the code is identical to what is seen in there only the syntax has changed.

I will have to compile the mirror example from C code and see if the behavior is the same.

kdschlosser commented 3 years ago

OK I compiled the mirror example and ran it. It does work slightly better. tho it calls the CreateFile callback for the root directory when opening the drive some 58 times. It does all of them right in a row before it does anything else.

The free space, used space and total space available on the drive doesn't work and when I did a select all it crashed windows explorer. There are insufficient buffer errors, it is returning a directory size for root as being 8k when it is 23mb and all of the icons have locks on them.

I am running Windows 7 x64 SP1. Nothing special about it.

it makes a total of 254 callbacks to list 25 files in the root directory and 146 callbacks to list 5 files in a sub directory.

kdschlosser commented 3 years ago

well I have good news. I managed to get the mirror 100% working. friggin Python and its manipulating of the data structures. It turns out that any function that returned a handle would return an int instead and when passing a python into to a Windows function it was apparently picking the wrong data type but instead of an error occurring I ended up with all kinds of goofy behavior.

I have the disk size and remaining also working properly. The filesystem name is working but the volume name is not. I am not sure what is up with it, I will have to mess around with it and see if I can't get it to work.

Liryna commented 3 years ago

Hi @kdschlosser ,

Any news or state of the current binding ? Let me know when you feel ready that I create you an official repo!

Liryna commented 2 years ago

Closing this for inactivity