MannLabs / alphapept

A modular, python-based framework for mass spectrometry. Powered by nbdev.
https://mannlabs.github.io/alphapept/
Apache License 2.0
167 stars 29 forks source link

error running with mzML file #316

Closed JianAtSeer closed 2 years ago

JianAtSeer commented 3 years ago

Describe the bug I tried to run the pipeline on a two mzml files, but I got the error An exception occured running AlphaPept version 0.3.28: File extension .mzML not understood.

I read in the documentation that the pipeline is relying on pyteomics to read/parse mzml files. So i tried to load the mzml file with pyteomics separately and it seems to be fine ....................................................

File extension .mzML not understood. .................................................... To Reproduce Steps to reproduce the behavior:

  1. alphapept workflow default_settings.yaml

Expected behavior The pipeline finish all the steps

Screenshots image

Version (please complete the following information):

Additional context Add any other context about the problem here. Attached log files or upload data files if possible.

straussmaximilian commented 3 years ago

Hi, It looks as the mzML import is not integrated correctly. As for the mzML files, the spectrum title is not defined. Could you upload and share one of these files so that we can ensure compatibility?

JianAtSeer commented 3 years ago

Hi, Thanks for the quick reply. Actually, after playing around with it a little bit, I think the issue could be somewhere else. I was able to run the pipeline complete when there is a single file. But it fails when there are multiple files, specifically during the step of feature finding.

Specifically in file interface.py, the following code section seems to raise an error when the file is not .raw or .d, I am not sure if this is intended? Is feature finding not supported for mzML files? The pipeline seems to run fine when i just input a single mzML files, though

Limit number of processes for Bruker FF

    if step.__name__ == 'find_features':
        base, ext = os.path.splitext(files[0])
        if ext.lower() == '.d':
            memory_available = psutil.virtual_memory().available/1024**3
            n_processes = max((int(memory_available //25 ),1))
            logging.info(f'Using Bruker Feature Finder. Setting Process limit to {n_processes}.')
        elif ext.lower() == '.raw':
            memory_available = psutil.virtual_memory().available/1024**3
            n_processes = max((int(memory_available //8 ), 1))
            logging.info(f'Setting Process limit to {n_processes}')
        else:
            raise NotImplementedError('File extension {} not understood.'.format(ext))
straussmaximilian commented 3 years ago

Good catch. I made some fixes in the https://github.com/MannLabs/alphapept/tree/qc_fixes branch. Do you mind testing if this solves the problem? This will then be included in the next release.

JianAtSeer commented 2 years ago

Thanks. I just tested the fixes and it works! On side note i was wondering if you can help me with a different issue. While I was testing the branch codes. I noticed, some of my mzML files takes a really long time for feature finding to run (i.e. > 1 day). But my other files finish in like < 20 min. They are all from similar samples. I was wondering if you can help me see why this set of files is special and takes so long to do feature finding. here is the file: https://seer.box.com/s/yofs6w3vy3twodsbiigf2d1yj8z1kime Thanks a lot.

straussmaximilian commented 2 years ago

Hi, this sounds like a bug. Thanks for sharing the file, I will investigate.

JianAtSeer commented 2 years ago

Hi, Just checking to see if you have any clue on the reason for the long-running feature detection. Also do you want me to open this as a separate issue? As this is not exactly related to the original issue I open the ticket about. Thanks

straussmaximilian commented 2 years ago

Hi, yes I could reproduce the bug, it is probably some runtime condition and probably have time to investigate / fix this tomorrow. Good idee to open another issue, then we reference this properly.

straussmaximilian commented 2 years ago

Bug was related to having zero intensities in the mzML, should now be fixed. Feel free to check out the develop branch, otherwise it will be included in the next release.

JianAtSeer commented 2 years ago

Great! Thanks. I just confirmed the fixes works.