EuracBiomedicalResearch / batch_centroid

Utility functions to perform batch centroiding of profile mzML files using MSnbase
2 stars 0 forks source link
metabolomics

Converting Sciex wiff files to mzML

Before being able to do any centroiding or data analysis at all the data has to be converted from the vendor-specific wiff format into mzML files. The [proteowizard] software, specifically the msconvert script, is one of the default tools for this task - requires however Microsoft Windows (and vendor dll files) to work properly. There is however a nice alternative to the Windows installation of the tools: docker.

Installing proteowizard docker image

docker run -it --rm -e WINEDEBUG=-all \
    -v /your/data:/data \
    chambm/pwiz-skyline-i-agree-to-the-vendor-licenses \
    wine msconvert /data/file.raw

Use of singularity instead of docker on the clusters

For security reasons, singularity might be the better choice on a calculation cluster.

singularity run --bind /your/data:/data \
pwiz-skyline-i-agree-to-the-vendor-licenses_latest.sif 
wine msconvert /data/file.raw

Converting wiff to (profile) mzML files

The convert_to_mzML.sh script uses this dockerized msconvert to convert all wiff files in a specified folder to mzML.Configure the variables within the script to point to the folder containing all wiff files, and optionally to the folder containing the centroided mzML files (the latter is useful if a large batch of files was already centroided and centroiding should then only be performed on the new files). Converted profile-mode mzML files are stored in the same folder the wiff files reside.

To run the script on the cluster using slurm:

sbatch --mem-per-cpu=8000 -w calc04 -c 1 ./convert_to_mzML.sh

Perform R-based centroiding

This repository provides functions to perform batch centroiding of profile mzML files using MSnbase.

Perform centroiding of all files in a folder

Check if files are centroided

We are simply checking if a) we can read the mzML file and b) if it is indeed centroided.

Conversion workflow on the IFB calculation servers

Perform centroiding using proteowizard's msconvert

For very fast checks (e.g. for system suitability tests) it might be OK to use proteowizard's centroiding.

Tipps and tricks

For our cluster it might be helpful to put jobs in e.g. the slow queue, as this will cause other jobs to be paused to automatically run mine (i.e. add -p slow as a parameter.

Also, on the cluster we have only 2 nodes (calc02 and calc04) that have enough memory (i.e. 375GB). So, the configuration of the job has to match these maximum configuration.