ProteoWizard / pwiz

The ProteoWizard Library is a set of software libraries and tools for rapid development of mass spectrometry and proteomic data analysis software.
http://proteowizard.sourceforge.net/
Apache License 2.0
215 stars 97 forks source link

O_RDWR vs. O_WRONLY #2861

Closed christophgil closed 6 months ago

christophgil commented 6 months ago

I am using msconvert through docker. When the outfile is opened for writing the system function open() is invoked with flag O_RDWR.

Why not O_WRONLY? The file is to be written right?

I am developing a fuse file system. Interestingly, I have seen the flag O_RDWR in other MS-software where a file is read and not written and therefore I used to treat O_RDWR as O_RDONLY. I think this software was also running in Wine.

Since O_RDWR is used with the intention to write here , my simple rule does not work any more.

Is this flag comming from Wine or from msconvert.ext? How do you create the file pointer / file descriptor of the outfile?

Best Christoph

chambm commented 6 months ago

If you're talking about a text format like mzML or mzXML, then it's opened with a boost::nowide::ofstream call in MSDataFile.cpp:openFile(). This boils down to: _wfopen(filename, mode) where mode is wb.

According to https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170 the w mode should use _O_WRONLY. Checking the system call in Process Monitor, I see:

Operation:  CreateFile
Result: SUCCESS
Path:   C:\dev\pwiz\xxxx.mzML
Desired Access: Generic Write, Read Attributes
Disposition:    OverwriteIf
Options:    Synchronous IO Non-Alert, Non-Directory File
Attributes: N
ShareMode:  Read, Write
AllocationSize: 0
OpenResult: Overwritten

I'm not sure whether ShareMode or "Desired Access" having "Read" in it means it actually opened with _O_RDWR though. Stepping in deeper with the debug CRT doesn't make anything stick out at me about it being opened with read access.

So I'm guessing the mismatch is coming from Wine. But msvcrt_get_flags in https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/msvcrt/file.c looks like it should also be using _O_WRONLY.

It strikes me as much stranger to see a file opened as O_RDWR when it's never written to. That would imply people opening read-only files with mode 'w+', 'a+', or 'r+'.

christophgil commented 6 months ago

Hi Matt,

thanks for the quick answer. Indeed, open is called twice in the fuse FS:

1) O_RDWR:No O_RDONLY:No O_WRONLY:No O_NONBLOCK:Yes O_NDELAY: Yes 2) O_RDWR:Yes O_RDONLY:No O_WRONLY:No O_NONBLOCK:Yes O_NDELAY: Yes No idea what the use first invokation is for.

Regarding O_RDWR, I have not yet checked with strace yet.

I also noted that atomic file creation is broken ... --outfile my_file.mzML.$$.tmp && mv my_file.mzML.$$.tmp my_file.mzML

because msconvert appends ".mzML" to the outfile if it ends with ".tmp"

Also --outfile /dev/stdout or --outfile CON: or --outfile '-' does not seem to to work.

Anyway, I will find workarounds.

Thanks for this nice tool and the convenient docker interface. Best Christoph

On Wed, Feb 7, 2024 at 4:51 PM Matt Chambers @.***> wrote:

If you're talking about a text format like mzML or mzXML, then it's opened with a boost::nowide::ofstream call in MSDataFile.cpp:openFile(). This boils down to: _wfopen(filename, mode) where mode is wb.

According to https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170 the w mode should use _O_WRONLY. Checking the system call in Process Monitor, I see:

Operation: CreateFile Result: SUCCESS Path: C:\dev\pwiz\xxxx.mzML Desired Access: Generic Write, Read Attributes Disposition: OverwriteIf Options: Synchronous IO Non-Alert, Non-Directory File Attributes: N ShareMode: Read, Write AllocationSize: 0 OpenResult: Overwritten

I'm not sure whether ShareMode or "Desired Access" having "Read" in it means it actually opened with _O_RDWR though. Stepping in deeper with the debug CRT doesn't make anything stick out at me about it being opened with read access.

So I'm guessing the mismatch is coming from Wine. But msvcrt_get_flags in https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/msvcrt/file.c looks like it should also be using _O_WRONLY.

It strikes me as much stranger to see a file opened as O_RDWR when it's never written to. That would imply people opening read-only files with mode 'w+', 'a+', or 'r+'.

— Reply to this email directly, view it on GitHub https://github.com/ProteoWizard/pwiz/issues/2861#issuecomment-1932469624, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRZU6AUWIPXYUO5H77GVMTYSOWHJAVCNFSM6AAAAABC55GEF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGQ3DSNRSGQ . You are receiving this because you authored the thread.Message ID: @.***>

chambm commented 6 months ago

By default the file extension gets overridden if it isn't one of the supported output formats' extensions (e.g. mzML, mzXML, mgf, mz5, etc.). You can override that override with --ext .tmp.

I am not at all surprised that the output to stdout does not work on Wine/Docker. The whole thing is basically indistinguishable from magic. ;)

christophgil commented 6 months ago

Cool, the --ext option is exactly what I need.

Maybe I have an idea of how to cope with O_RDRW. Perhaps to treat O_RDRW like read only and to fix the file descriptor in case write()/ truncate() is invoked. I am confident that this might work.

The file system will present the mzML and mgf files along with the sciex and Thermo raw files. The mzML and mgf files may not need to exist. They are displayed in the file listing and come into life when they are used. They are deleted after a certain time unless last-access time was updated. The Thermo and Sciex files are archived as ZIP in our intermediate or permanent store, but are shown as single plain files.

On Wed, Feb 7, 2024 at 5:52 PM Matt Chambers @.***> wrote:

By default the file extension gets overridden if it isn't one of the supported output formats' extensions (e.g. mzML, mzXML, mgf, mz5, etc.). You can override that override with --ext .tmp.

I am not at all surprised that the output to stdout does not work on Wine/Docker. The whole thing is basically indistinguishable from magic. ;)

— Reply to this email directly, view it on GitHub https://github.com/ProteoWizard/pwiz/issues/2861#issuecomment-1932577353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRZU6BTG3Q7A5XD7BFKYE3YSO5OLAVCNFSM6AAAAABC55GEF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGU3TOMZVGM . You are receiving this because you authored the thread.Message ID: @.***>

christophgil commented 6 months ago

Hi Matt,

Its working. I overlooked that after the FUSE call back function open(), also create() is called. The later is important. Here my software creates the real file descriptor with write permission. And then I see invocations of FUSE write() and the output file is written nicely. All fine.

The docker wrapper is cool because it makes things portable without the need to install anything.

Best Christoph

On Wed, Feb 7, 2024 at 4:51 PM Matt Chambers @.***> wrote:

If you're talking about a text format like mzML or mzXML, then it's opened with a boost::nowide::ofstream call in MSDataFile.cpp:openFile(). This boils down to: _wfopen(filename, mode) where mode is wb.

According to https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170 the w mode should use _O_WRONLY. Checking the system call in Process Monitor, I see:

Operation: CreateFile Result: SUCCESS Path: C:\dev\pwiz\xxxx.mzML Desired Access: Generic Write, Read Attributes Disposition: OverwriteIf Options: Synchronous IO Non-Alert, Non-Directory File Attributes: N ShareMode: Read, Write AllocationSize: 0 OpenResult: Overwritten

I'm not sure whether ShareMode or "Desired Access" having "Read" in it means it actually opened with _O_RDWR though. Stepping in deeper with the debug CRT doesn't make anything stick out at me about it being opened with read access.

So I'm guessing the mismatch is coming from Wine. But msvcrt_get_flags in https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/msvcrt/file.c looks like it should also be using _O_WRONLY.

It strikes me as much stranger to see a file opened as O_RDWR when it's never written to. That would imply people opening read-only files with mode 'w+', 'a+', or 'r+'.

— Reply to this email directly, view it on GitHub https://github.com/ProteoWizard/pwiz/issues/2861#issuecomment-1932469624, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRZU6AUWIPXYUO5H77GVMTYSOWHJAVCNFSM6AAAAABC55GEF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGQ3DSNRSGQ . You are receiving this because you authored the thread.Message ID: @.***>

christophgil commented 4 months ago

Hi Matt,

I tried to create the outfile in a different folder. This does not work - does it?

Or is my cmd line wrong?

docker run --mount "type=bind,source=$src,target=/src" --mount "type=bind,source=$dst,target=/dst" -it --rm chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert /src/file.wiff --outfile /dst/file.wiff.mzML --mzML

or docker run -v $src:/src -v $dst:/dst:rw -it --rm chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert /src/file.wiff --ext .tmp --outfile /dst/file.wiff.mzML --mzML

Do I understand it correctly, the working directory ./ is set to the last mount?

Anyway, I can work around it. Its not a big problem.

Best Christoph

Message ID: @.***>

christophgil commented 4 months ago

Hi Matt

Please forget my last request.

Just found out myself. I overlooked the option --outdir.

This works: docker run -v $src:/data -v $dst:/dst -it --rm chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert file.wiff --ext .tmp --outdir /dst --mzML

Best Christoph

Message ID: @.***>

chambm commented 4 months ago

IIRC --outfile only changes the filename (useful for WIFF if you know it only has one sample because normally the sample name is automatically appended to the filename). To change the output path, use -o.

chambm commented 4 months ago

Oh maybe I should read all the posts before replying. :P