fickludd / dinosaur

Feature finding algorithm for detection of isotope patterns in HPLC mass spectrometry data.
42 stars 19 forks source link

heavy isotope labelled peptide feature detection #5

Open Owen-Duncan opened 6 years ago

Owen-Duncan commented 6 years ago

Hi Johan, I'm working with peptide samples metabolically labelled with heavy hydrogen. Dinosaur is doing a good job of detecting peptide features in these samples but is commonly dropping the first isotope in the series out of the detected feature envelope or splitting the feature into two at the third isotope. I have tried altering the deisocorr, deisovalleyfactor, averaginecorr, averagineexplained parameters and these alter the feature borders but not the problematic behaviour. I theorise it's related to inappropriate selection of the monoisotopic peak. I've attached an example.

I have the project source but would value your advice before digging in.

example

fickludd commented 6 years ago

Hi,

First of all: awesome that you are critically inspecting the results so you know what is not working. Getting to the bottom of why Dinosaur is making mistakes on particular features is a bit of a craft, and it's hard to fix specific features without breaking others, so be warned :). My general process would be

As for your setup, the fact that you use labelled data is likely to mess with the expected isotope distribution computation. To solve that it would be nice to compute averaginecorr against both labelled and non-labelled distributions, and select the best one (if some param flag was set to look for labelled features).

Hope that helps!

ykil commented 6 years ago

Hi fickludd, Awesome and fun work. I am also seeing that the hill climbing is not picking up very abundant isotopes (several left ones). Anyway to see the centroided data as output? Also, I would like to have direct chat with you if you are available.

Hi Owen Duncan, What software are you using to visualize the 2d map?

Thanks in advance.

fickludd commented 6 years ago

@ykil Glad you like it!

What do you mean by see the centroided data as output? In general the features are the output, and these are computed on centroided data. There are examples of the centroiding algorithm in the audit trail, and I've never observed any problems with this part. This part is btw an exact clone of MaxQuant, so you would be seeing issues there as well if that was the problem.

As for direct chat, please mail me at my gmail account (I'm the first author of the paper) and describe what you want to talk about, and I can see if I can find some time.