Closed Grille closed 2 years ago
Hey
The file provider doesn't copy anything into memory ever. No matter if it's big files or small files. It simply uses MapViewOfFile
on Windows and mmap
on Unix to map the file into memory.
I made sure to close the file handle now after mapping it but I'm honestly not sure if it did much. I still can't modify the file as long as it's open in ImHex. I can read it though with other applications. In any case, I also added a "Reload File" option to the File menu (CTRL + R) to reload the file if needed. I honestly don't know if I can do any more than this. Please feel free to make a PR if you know more about it: https://github.com/WerWolv/ImHex/blob/master/plugins/builtin/source/content/providers/file_provider.cpp#L183
I feel like there is a fundamental disconnect between how the user percieves the application to operate, and how it actually does. Based on how many other editing applications work, the user expects the application to open the file, read the contents into memory, then close the file; and only write back to the file on a saving operation, with all other operations occurring on that stream in memory. However, this application simply exposes the bytes on disk via the memory mapping; all changes are immediately propagated to the file. If you edit a byte, it changes the byte on disk immediately. Because of this, the file remains locked at all times that ImHex has it open, even if it closes the file handle after mapping the file; the close operation does not unmap the file from memory, only releases the handle used to map it. Because the MapViewOfFile function guarantees coherency, having the file mapped means that no other process can write to the file until it is unmapped; the other processes will not be granted a write handle, only a read one.
I suspect that the way around this is to instead map the file using FILE_MAP_COPY, which maps the file using a "copy-on-write" handler. If a change is made, the system automatically copies the file to memory, and the change is made to that copy residing in memory. This copy can be written back on a save operation.
References are to the Windows functions, since that is the platform I am using. I am also seeing discussion suggesting that the Microsoft documentation on the FILE_MAP_COPY mode is incorrect; that it is a single mode that cannot be OR'd with the other base modes described; this makes some sort of sense to me, as the other modes provide mapping to the file data on disk, while FILE_MAP_COPY would need to provide mapping to either the file data or memory, depending on if the copy-on-write has been executed already or not. See https://stackoverflow.com/questions/55018806/copy-on-write-file-mapping-on-windows for this.
From the documentation of MapViewOfFile:
When copy-on-write access is specified, the system and process commit charge taken is for the entire view because the calling process can potentially write to every page in the view, making all pages private.
So I don't think FILE_MAP_COPY
is really usable in our case. The user can load arbitrarily large files and modify arbitrary bytes in them, potentially loading huge amounts of data into RAM.
The contents of the new page are never written back to the original file and are lost when the view is unmapped.
This makes writing back to the file either impossible or really slow and inefficient.
ImHex should take the same approach radare2 does, by opening the file in read-only mode as the default option. This converts it from being a hex editor to being a hex viewer, and allows you to release the file since you're no longer waiting for the user to edit the file in ImHex. The file can then be reloaded in write mode if the user decides it.
What feature would you like to see?
Freeing the file after it being loaded, so it can be used by other programs again. Especially smaller files seem to be loaded fully to memory, so I see no reason to keep them locked.
Additional to that: It would then maybe also be useful, to add a refresh-file function to get changes. And probably also a tracker that tells you when your viewed data, and the file content no longer match. (like notepad++ as example)
How will this feature be useful to you and others?
Often when working on a file, I want to open the file in other programs to test things which ends in an error, an workaround is coping the file but that can also get annoying over time.
Or, I forget that I have the file open in imhex, and I wonder for 10m why my program no longer works…
Request Type