524D / compareMS2

Compare samples by MS2 spectra
MIT License
3 stars 0 forks source link

How we can increase the speed of compareMS2 #47

Open chinmayaNK22 opened 1 year ago

chinmayaNK22 commented 1 year ago

Hello,

I am trying to run compareMS2 on my windows system (Intel i9 - 8 cores, 48GB RAM) for 126 raw files from 4 species. It has not completed even 50% of the comparison in > 24 hours.

I would like to know if is there any way or option to increase the compareMS2 speed for large datasets.

524D commented 1 year ago

Hi,

Thanks for using compareMS2!

It's quite normal that the speed reduces drastically for large number of files. The main reason is that the computation time of the comparison increases by the square of the number of files (each file is compared to all others). For some parts of the software, the complexity is O(3) (so third power of number of files) but in most cases that is not the limiting factor.

The time used for each comparison is roughly equal to the product of the number of spectra in each file. Speedup is therefore possible by limiting the number of spectra in the files. Clearly, discarding spectra may adversely affect the result.

The software is currently not multi-threaded, but the speed seems limited by memory access time, not processor speed. Because of this, more cores don't help.

I will keep this issue open because performance optimization is something we need to look at in more detail.

chinmayaNK22 commented 1 year ago

Thank you Rob

I will try to run the files with existing resources.

-- Chinmaya

On Fri, 25 Nov, 2022, 3:12 pm Rob Marissen, @.***> wrote:

Hi,

Thanks for using compareMS2!

It's quite normal that the speed reduces drastically for large number of files. The main reason is that the computation time of the comparison increases by the square of the number of files (each file is compared to all others). For some parts of the software, the complexity is O(3) (so third power of number of files) but in most cases that is not the limiting factor.

The time used for each comparison is roughly equal to the product of the number of spectra in each file. Speedup is therefore possible by limiting the number of spectra in the files. Clearly, discarding spectra this may adversely affect the result.

The software is currently not multi-threaded, but the speed seems limited by memory access time, not processor speed. Because of this, more cores don't help.

I will keep this issue open because performance optimization is something we need to look at in more detail.

— Reply to this email directly, view it on GitHub https://github.com/524D/compareMS2/issues/47#issuecomment-1327223512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOAW3W5ON62XXIDLZ6ASG3TWKCCZTANCNFSM6AAAAAASK4D2UM . You are receiving this because you authored the thread.Message ID: @.***>