compomics / moFF

A modest Feature Finder (moFF) to extract MS1 intensities from Thermo raw file
Apache License 2.0
32 stars 11 forks source link

Settings to accelerate execution time #49

Open veitveit opened 4 years ago

veitveit commented 4 years ago

We are using moFF on a larger dataset (27 label-free runs) and it takes a very long time (about 40h), even with setting for 16 threads and 150GB RAM.

Are there any parameter settings that can shorten the execution time? I should be missing something here because the paper says that this is fast method applicable to large datasets.

And as a motivator: MaxQuant does take less than half of the time.

Data set: http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD001819 All parameter values are default (i.e. not provided via command-line) but the mass tolerance which is 5.

The input files are from PeptideShaker and raw files.

moff_all is run from a docker container that has moff=2.0.3 installed as conda package.

Maux82 commented 4 years ago

Hi,

Sorry for my late reply, at moment I am working outside academia and I follow partially moFF in my free time.

A couple of questions :² Do you ran the matching between runs across all the 27 runs ? if yes , I can imagine that the number of matched peptides is really big in each run, this should one possible explanation.

How many PSM do you have in each runs as average after fdr calculation ?

Do you use the "--match_filter" option or not ? if yes this could add time in the computation.

Eventually you try to set "--xic_length" to 2 or 2.2 minutes, to see if it gains some speed.

veitveit commented 4 years ago

Hi @Maux82,

Thanks a lot for the help!

I am running the full set of 27 runs, and I am not using the _--matchfilter option. Does setting the filtering to true speed up things or the opposite?

There are around 10,000 PSMs per run.

I will try to decrease the _xiclength and how it will perform.