Before being able to do any centroiding or data analysis at all the data has to
be converted from the vendor-specific wiff format into mzML files. The
[proteowizard] software, specifically the msconvert
script, is one of the
default tools for this task - requires however Microsoft Windows (and vendor
dll files) to work properly. There is however a nice alternative to the
Windows installation of the tools: docker.
docker pull chambm/pwiz-skyline-i-agree-to-the-vendor-licenses
to get the official
Proteowizard docker image.docker run -it --rm -e WINEDEBUG=-all \
-v /your/data:/data \
chambm/pwiz-skyline-i-agree-to-the-vendor-licenses \
wine msconvert /data/file.raw
For security reasons, singularity might be the better choice on a calculation cluster.
singularity pull docker://chambm/pwiz-skyline-i-agree-to-the-vendor-licenses
.singularity run --bind /your/data:/data \
pwiz-skyline-i-agree-to-the-vendor-licenses_latest.sif
wine msconvert /data/file.raw
The convert_to_mzML.sh script uses this dockerized msconvert to convert all wiff files in a specified folder to mzML.Configure the variables within the script to point to the folder containing all wiff files, and optionally to the folder containing the centroided mzML files (the latter is useful if a large batch of files was already centroided and centroiding should then only be performed on the new files). Converted profile-mode mzML files are stored in the same folder the wiff files reside.
To run the script on the cluster using slurm
:
sbatch --mem-per-cpu=8000 -w calc04 -c 1 ./convert_to_mzML.sh
This repository provides functions to perform batch centroiding of profile mzML
files using MSnbase
.
in_dir
, path_pattern
, path_replace
.sbatch --mem-per-cpu=8000 -w mccalc07 -c 12 ./centroiding.sh
-p slow
to run the job immediately if the cluster is full.find . -type f -name "*.mzML" -delete
.We are simply checking if a) we can read the mzML file and b) if it is indeed centroided.
chech_files.sh
script.convert_to_mzML.sh
script ensuring SOURCE="/data/massspec/wiff"
and DEST="/data/massspec/mzML"
. This will only convert wiff files to
profile-mode mzML files if no centroided mzML file with the same name does
already exist in /data/massspec/mzML.centroiding.sh
script to centroid the profile-mode mzML files.msconvert
For very fast checks (e.g. for system suitability tests) it might be OK to use proteowizard's centroiding.
For our cluster it might be helpful to put jobs in e.g. the slow queue, as
this will cause other jobs to be paused to automatically run mine (i.e. add -p slow
as a parameter.
Also, on the cluster we have only 2 nodes (calc02 and calc04) that have enough memory (i.e. 375GB). So, the configuration of the job has to match these maximum configuration.