ProteoWizard / pwiz

The ProteoWizard Library is a set of software libraries and tools for rapid development of mass spectrometry and proteomic data analysis software.
http://proteowizard.sourceforge.net/
Apache License 2.0
210 stars 96 forks source link

Python wrapper lib? #1275

Open fkromer opened 3 years ago

fkromer commented 3 years ago

I wondered if you guys would be interested in wrapping the pwiz functionality into a Python wrapper. Another chem lib OpenMS uses an utility lib autowrap to make this process as painless as possible to provide pyopenms and to keep it in sync with OpenMS. Howerver I'm not sure if the pwiz implementation satisfies the requriements of autowrap to work properly. In any case it would be very valuable to beeing able to use pwiz functionality in the major data science language Python.

chambm commented 3 years ago

That autowrap library looks quite nice. We don't really have an internal need for this but we're certainly open to a PR that introduces the capability. And if someone adds it I will maintain it as the pwiz API changes. I would help with the Boost.Build aspect and integrate it into our CI testing.

I think there's already some existing SWIG wrapper for one of the simplest, C-style interfaces to pwiz (pwizRampAdapter) but it doesn't get maintained and I'm not sure anybody's using it. I don't maintain it because the pwizRampAdapter is too simple and doesn't model mzML properly.

fkromer commented 3 years ago

That autowrap library looks quite nice. We don't really have an internal need for this but we're certainly open to a PR that introduces the capability. And if someone adds it I will maintain it as the pwiz API changes. I would help with the Boost.Build aspect and integrate it into our CI testing.

Thanks a lot for your support offer. I'll clarify task priorities internally and let you know if and how much effort I can put into this.

I think there's already some existing SWIG wrapper for one of the simplest, C-style interfaces to pwiz (pwizRampAdapter) but it doesn't get maintained and I'm not sure anybody's using it. I don't maintain it because the pwizRampAdapter is too simple and doesn't model mzML properly.

Would be great if you could give me a concret reference that I can have a look into it.

chambm commented 3 years ago

By concrete reference do you mean the SWIG bindings I referred to you, or the C++ classes I'd like bindings for?

hroest commented 3 years ago

I suggest you have a look at autowrap and you can talk to me or @uweschmitt on what it actually does - it is basically a way to auto-generate Cyhton code. Since we developed autwrap there have also been some developments like pybind11 that look pretty interesting.

chambm commented 3 years ago

There's also Boost.Python, but it's been a long time since I looked at python bindings. My main concern with bindings is low maintenance and a preference that the OOP style of the pwiz MSData library not be reduced to a lowest-common-denominator C-style structs and procedures. My brief glance at autowrap seemed like it would handle that (including boost::shared_ptr which we use very often in MSData). Do you want to glance at our MSData.hpp header and tell me whether autowrap would be a good fit? It's basically just a C++ representation of the mzML data model.

Of course, it would also be nice to have vendor reader support. I'm not sure how that would work with Cython.

hroest commented 3 years ago

I think there would be strong interest in this, I think the R bindings enjoy quite a bit of popularity. Boost.Python is doable but its a lot of manual work - I have done that once and I can not recommend it. Basically you have to translate every single data structure manually in C++ code which is not fun. The way autowrap works is that it basically generates a Python object with a single member, a Boost shared_ptr which points to the C++ object. This solves problems with memory management since that is done in C++ and once no more Python object points to the C++ objects it can safely get deleted.

Of course, it would also be nice to have vendor reader support. I'm not sure how that would work with Cython.

I think as long as the API calls in pwiz are there and the dlls are distributed it should actually work.

fkromer commented 3 years ago

@chambm

By concrete reference do you mean the SWIG bindings I referred to you, or the C++ classes I'd like bindings for?

Both would be great.

My main concern with bindings is low maintenance and a preference that the OOP style of the pwiz MSData library not be reduced to a lowest-common-denominator C-style structs and procedures.

I fully agree.

@hroest

Of course, it would also be nice to have vendor reader support. I'm not sure how that would work with Cython.

I think as long as the API calls in pwiz are there and the dlls are distributed it should actually work.

Definitelly. I'm used to work with Linux runtime environments and not with Windows runtime environments. However I could imagine that it's not straightforward to support vendor software components in a Python wrapper.

As far as I know it's impossible to run Windows DLLs in Linux environments without some kind of abstraction. That's independent of the fact if DLLs do depend on runtime environment components installed on the runtime OS like e.g. C# or C++ runtime frameworks or not. The Python wrapper would not work on Linux runtime environments of course. Do different C#/C++ runtime framework versions installed on the same Windows system potentially conflict? In this case one would have to state the runtime framework dependencies explicitly and carefully. Otherwise you can run into major issues if you want to run some other software on your Windows runtime environment which require different C#/C++ framework versions.

If vendor components would be available as plain C++ or plain C# one could add support for Linux environments probably (using the mono project in case of C#).

chambm commented 3 years ago

For the C++ class, see the MSData.hpp file I linked to. Here's the SWIG bindings: https://github.com/ProteoWizard/pwiz/tree/master/pwiz/utility/bindings/SWIG

chambm commented 3 years ago

As long the vendor support works with Python on Windows, it should probably work in the pwiz docker container via wine.

fkromer commented 3 years ago

For the C++ class, see the MSData.hpp file I linked to. Here's the SWIG bindings: https://github.com/ProteoWizard/pwiz/tree/master/pwiz/utility/bindings/SWIG

Thanks for the hints.

As long the vendor support works with Python on Windows, it should probably work in the pwiz docker container via wine.

Yeah, right.

mobiusklein commented 3 years ago

Do different C#/C++ runtime framework versions installed on the same Windows system potentially conflict? In this case one would have to state the runtime framework dependencies explicitly and carefully. Otherwise you can run into major issues if you want to run some other software on your Windows runtime environment which require different C#/C++ framework versions

On Windows, the runtime version (ergo the version of the platform libraries you get) is usually tied to the MSVC compiler version you use. This is why Windows users had to install the appropriate MSVC redistributable C++ runtimes (the "CRT") in order to get C/C++ binaries to work. Around Windows 8 or 10, they transitioned to what they called "universal CRT" or UCRT which is an operating system feature, installed with the standard OS update mechanism. The .NET runtime is also selected by the compiler and on Windows installed as an OS feature or update. So for most new machines, this isn't an issue anymore unless you're using a really old program with extra features (like Py2.7+OpenMP). Naturally, running something built with the UCRT on a machine predating it means you need to ship the UCRT libraries with your executable.

If you're going to have Python link with a Windows executable running under Wine on a *nix, you'll need to be using a Windows-compiled Python too, correct?

fkromer commented 3 years ago

On Windows, the runtime version (ergo the version of the platform libraries you get) is usually tied to the MSVC compiler version you use. This is why Windows users had to install the appropriate MSVC redistributable C++ runtimes (the "CRT") in order to get C/C++ binaries to work. Around Windows 8 or 10, they transitioned to what they called "universal CRT" or UCRT which is an operating system feature, installed with the standard OS update mechanism. The .NET runtime is also selected by the compiler and on Windows installed as an OS feature or update. So for most new machines, this isn't an issue anymore unless you're using a really old program with extra features (like Py2.7+OpenMP). Naturally, running something built with the UCRT on a machine predating it means you need to ship the UCRT libraries with your executable.

Interesting.

If you're going to have Python link with a Windows executable running under Wine on a *nix, you'll need to be using a Windows-compiled Python too, correct?

Reading about Wine has not brought me any further so far.

chambm commented 3 years ago

You mostly don't need to read or worry about Wine. Just make it work on Windows, and Wine will allow it to work on Linux by magic.

salsa-dev commented 2 years ago

looking for such wrapper lib as well

PierreSnell commented 11 months ago

A hack I've been using (a bit ugly but working fine) is to spawn the pwiz/msconvert docker image from Python (requires docker though).

import docker

running_container = docker.DockerClient().containers.run(
        "chambm/pwiz-skyline-i-agree-to-the-vendor-licenses",
        "wine msconvert YOUR_FILE_OR_FOLDER_FROM_DOCKER_MOUNT --filter --arguments --things" 
         volumes={
             f"{LOCAL_FOLDER}": {"bind": "/data", "mode": "ro"},
             f"{LOCAL_OUTPUT}": {"bind": "/out_data", "mode": "rw"},
         },
         stdout=True, stderr=True, stream=True, detach=True, auto_remove=True,
         remove=True,  # To unmount the volume for the next run.
    )

    for log in running_container.logs(stream=True, stdout=True, stderr=True, timestamps=True)):
        logger.debug(log) # or print

Hope it helps someone.