MatteoLacki / IsoSpec

Libraries for fine isotopic structure calculator.
Other
34 stars 10 forks source link

Calculate distribution for heavy labeled peptides #27

Closed jonasfoe closed 5 years ago

jonasfoe commented 5 years ago

Hey Mateusz,

I stumbled upon this package at the winter school and decided to try it out. It performes really nicely but in some of my data it was applied to iRT heavy peptides. So the input table got like this:

 C   C13 H   N   N15 O 
 37  6   71  9   4   13

In that case, the package (IsoSpecR) silently ignored the C13 and N15 columns. I am not sure what to make of this. I haven't checked how the isotopic distribution of these peptides actually looks like and for now I just ignore them. Do you think it would make sense for the package to deal with inputs like that by either throwing an error or accepting these inputs and calculating a result based on the assumption that these heavy elements are fixed?

Best wishes, Jonas

MatteoLacki commented 5 years ago

Hello Jonas,

this should be dealt with by treating C13 as an artificial new element rather than C12. This will mean, that the other C atoms will still follow the isotopic pattern (so, some of them will be C13 just naturally).

I don't know which language you use to test our software. If it's R, then you can pass in your own isotopes as a data.frame and adjust the molecule. I attach the R script. I cannot fiddle with the python code now, but it boils down to getting the proper API and creating artificial elements too.

Best!

Mateusz Krzysztof Łącki

German tel. +49 159 01681376 Polish tel. +48 579 647 311 Skype: mathewin GitHub: MatteoLacki https://github.com/MatteoLacki

On Tue, Jan 22, 2019 at 11:46 AM jonasfoe notifications@github.com wrote:

Hey Mateusz,

I stumbled upon this package at the winter school and decided to try it out. It performes really nicely but in some of my data it was applied to iRT heavy peptides. So the input table got like this:

C C13 H N N15 O 37 6 71 9 4 13

In that case, the package silently ignored the C13 and N15 columns. I am not sure what to make of this. I haven't checked how the isotopic distribution of these peptides actually looks like and for now I just ignore them. Do you think it would make sense for the package to deal with inputs like that by either throwing an error or accepting these inputs and calculating a result based on the assumption that these heavy elements are fixed?

Best wishes, Jonas

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MatteoLacki/IsoSpec/issues/27, or mute the thread https://github.com/notifications/unsubscribe-auth/ADx9ANYUBteWFMkP3Hn9YocKuEbXJSZWks5vFuv3gaJpZM4aMZnP .

MatteoLacki commented 5 years ago

Just for order, here is the R script:

library(IsoSpecR)

data(isotopicData)
print(isotopicData)

print('Here are some isotopes')
i = isotopicData$IsoSpecShortZero
print(i)

# convention: D for deuterium, M for N15
foney_elements = data.frame(
    element = c('D', 'M'),
    isotope = c('D', 'M'),
    mass    = c(i[i$isotope == 'H2', 'mass'], i[i$isotope == 'N15', 'mass']),
    abundance = c(1, 1),
    ratioC  = c(NA, NA) # this is not important actually, I just keep it for consistency
)   

isotopes = rbind(i, foney_elements)
print(isotopes)

# Your atom counts:
# C  C13  H   N   N15 O
# 37  6   71  9   4   13
# translate into:
your_molecule = c(C=37, H=71, N=9, O=13, D=6, M=4)

res = IsoSpecify(your_molecule, .999, isotopes, showCounts=T)
print(res)
jonasfoe commented 5 years ago

Hey I think that sounds good. I am working with IsoSpecR. I don't think the docs for the R package specify which set of isotopes you are working with by default. I think that would be nice to know.

Thanks for the help!

MatteoLacki commented 5 years ago

OK, I will add it to the docs that the isotopes parameter can be used to pass custom isotopes while releasing the next iteration of the software. The whole R interface is pretty slim though: you can see directly what is done here. https://github.com/MatteoLacki/IsoSpec/blob/version_1_9/IsoSpecR/R/IsoSpecR.R

Cheers!

Mateusz Krzysztof Łącki

German tel. +49 159 01681376 Polish tel. +48 579 647 311 Skype: mathewin GitHub: MatteoLacki https://github.com/MatteoLacki

On Tue, Jan 22, 2019 at 2:35 PM jonasfoe notifications@github.com wrote:

Hey I think that sounds good. I am working with IsoSpecR. I don't think the docs for the R package specify which set of isotopes you are working with by default. I think that would be nice to know.

Thanks for the help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MatteoLacki/IsoSpec/issues/27#issuecomment-456401023, or mute the thread https://github.com/notifications/unsubscribe-auth/ADx9ALu1rxe177zHoazKIiVoVPkHbZidks5vFxOggaJpZM4aMZnP .

lopippo commented 5 years ago

Greetings,

I just saw the IsoSpec/Examples/C++/radiolabelling.cpp example, and I wondered how to make it easy for people to configure such situation. I'm very excited with this example. Just to be sure that I understand it correcly: in the labelling example, would it be the same to do the following:

  1. Open the element_table.cpp file,

  2. Invent a Cz new element that will simulate the heavy carbon

  3. Fill-in the three corresponding masses

  4. Fill-in the three corresponding probabilities (corresponding to 0.05normal_carbon_masses[0], 0.05normal_carbon_masses[1], 0.95 probs

  5. Define the fomula C5H12O6Cz1

  6. Run the calculation

I have started working ona graphical interface to help some of my collaborators explore all this without having to code anything. At the moment all I have is the attached screen dump. But, imagine that one could add lines to the table to "invent" new elements, like we did with Cz, then they could perform simulations without coding. Am I right ? Cheers, Filippo

PS The packaging for Debian is finished! I'll upload it soon. Will take some time because we are in freeze now. isospec

MatteoLacki commented 5 years ago

That is all correct.

Mateusz Krzysztof Łącki

German tel. +49 159 01681376 Polish tel. +48 579 647 311 Skype: mathewin GitHub: MatteoLacki https://github.com/MatteoLacki

On Tue, Jan 22, 2019 at 7:22 PM lopippo notifications@github.com wrote:

Greetings,

I just saw the IsoSpec/Examples/C++/radiolabelling.cpp example, and I wondered how to make it easy for people to configure such situation. I'm very excited with this example. Just to be sure that I understand it correcly: in the labelling example, would it be the same to do the following:

1.

Open the element_table.cpp file, 2.

Invent a Cz new element that will simulate the heavy carbon 3.

Fill-in the three corresponding masses 4.

Fill-in the three corresponding probabilities (corresponding to 0.05normal_carbon_masses[0], 0.05normal_carbon_masses[1], 0.95 probs 5.

Define the fomula C5H12O6Cz1 6.

Run the calculation

I have started working ona graphical interface to help some of my collaborators explore all this without having to code anything. At the moment all I have is the attached screen dump. But, imagine that one could add lines to the table to "invent" new elements, like we did with Cz, then they could perform simulations without coding. Am I right ? Cheers, Filippo

PS The packaging for Debian is finished! I'll upload it soon. Will take some time because we are in freeze now. [image: isospec] https://user-images.githubusercontent.com/16020903/51556594-e8145200-1e7a-11e9-87a2-d187c1a891e8.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MatteoLacki/IsoSpec/issues/27#issuecomment-456507300, or mute the thread https://github.com/notifications/unsubscribe-auth/ADx9AHSlxoMpCRrhPbNzqucM2D9lbT8cks5vF1bKgaJpZM4aMZnP .

lopippo commented 5 years ago

Greetings, Matteo,

Another way to describe what I wrote above is here, with a new version of the element_tables.cpp. The file is opened in vim, and shown in many vertical splits in such a manner that the new Cz element shows roughly at the same vertical position for the different tables (top of the split). One concern I have is that now the massNo table is no more "continuous" since I had to repeat 12 and 13 (and add a new 14 that will confound with the Nitrogen). Is this an issue? Please, would you be so kind as to countercheck that new carbonz/Cz element ? I think that this might be the best solution to have users configure their labelling? Do you have a better idea (thinking of non-dev users of your software)? Thanks for this interesting discussion. Cheers, Filippo isospec

jonasfoe commented 5 years ago

By the way I use skyline software for mass spec and they have the convention on calling it C'. That works decently in formulas as well e.g.: C13C'6N15N'4H60 Another convention out there is [14]C I think.

lopippo commented 5 years ago

Greetings, unfortunately, it seems that my modification of element_tables.cpp does not work. If I use C5H12O6Cz1 as the formula (and 0.999 as the cumulative probs), I get a single centroid at (198.099,8.19751). Of course, this is not possible, because, according to my "model", the first peak of the cluster should only be ~2 neutrons heavier than for the C12-only molecule. Right? Here, I am having an increment of 18u.

What confirms that there is a "modelling" problem, is that if I use my modified element_tables.cpp file (as shown in the example screen-dumped in my previous post) with the "unmodified" formula (C6H12O6), then I still get a bad result (198.099,16.395). I gather that this means that the massNo stuff I asked for before has an importance...

What's your take on this ? Cheers, Filippo

michalsta commented 5 years ago

Greetings,

I just saw the IsoSpec/Examples/C++/radiolabelling.cpp example, and I wondered how to make it easy for people to configure such situation. I'm very excited with this example. Just to be sure that I understand it correcly: in the labelling example, would it be the same to do the following:

  1. Open the element_table.cpp file,
  2. Invent a Cz new element that will simulate the heavy carbon
  3. Fill-in the three corresponding masses
  4. Fill-in the three corresponding probabilities (corresponding to 0.05_normal_carbon_masses[0], 0.05_normal_carbon_masses[1], 0.95 probs

Right, there was a rather embarassing (and obvious) typo in the example here, it should be 0.05 normalcarbonprobs (not masses)... it's fixed in git now.

  1. Define the fomula C5H12O6Cz1
  2. Run the calculation

I have started working ona graphical interface to help some of my collaborators explore all this without having to code anything. At the moment all I have is the attached screen dump. But, imagine that one could add lines to the table to "invent" new elements, like we did with Cz, then they could perform simulations without coding. Am I right ? Cheers, Filippo

PS The packaging for Debian is finished! I'll upload it soon. Will take some time because we are in freeze now. isospec

lopippo commented 5 years ago

Yes, thank you, I had corrected this myself, as you might see on the other screenshot (right hand side with the new prob and log(probs) values). But that does not work anyways, as described in a previous post.

Hope we'll find a way to configure "new elements" in the main element_tables.cpp file because that would be the most flexible way of doing things (labelling of a specific number of carbons vs whole molecule labelling). Cheers, Filippo

michalsta commented 5 years ago

Greetings, unfortunately, it seems that my modification of element_tables.cpp does not work. If I use C5H12O6Cz1 as the formula (and 0.999 as the cumulative probs), I get a single centroid at (198.099,8.19751). Of course, this is not possible, because, according to my "model", the first peak of the cluster should only be ~2 neutrons heavier than for the C12-only molecule. Right? Here, I am having an increment of 18u.

What confirms that there is a "modelling" problem, is that if I use my modified element_tables.cpp file (as shown in the example screen-dumped in my previous post) with the "unmodified" formula (C6H12O6), then I still get a bad result (198.099,16.395). I gather that this means that the massNo stuff I asked for before has an importance...

What's your take on this ? Cheers, Filippo

Right, I've found three issues in the code that Mateusz forwarded to me: first is, in glucose.cpp, you define int configs[5]; - that's too small, the space for configurations needs to be as large as a configuration, that is, have as many integers as there is isotopes - that is, 10 for radiolabelled glucode (2 for carbon, 2 for hydrogen, 3 for oxygen, and 3 for radiocarbon). With int configs[5] it writes unallocated memory and Bad Things Happen.

Second is a nasty typo: You have:

const double elem_table_log_probability [ISOSPEC_NUMBER_OF_ISOTOPIC_ENTRIES] = {
(...)
// Radioactive C14
-3.00657889,
-3.00657889,
-0.05129329   // <- note the lack of comma here!
// Radioactive C14
-0.003648633607616148452623683340334537206217646598815917968750,
(...)

The next value starts with a minus, which the compiler reinterprets as binary instead of unary, you get one value in table where there should be two, the arrays get misaligned, and again Bad Things Happen.

Also, when you define an artificial element it needs to have a unique elem_table_atomicNo value: elements are identified by their atomic number, and if it shares it with carbon, then these two will get treated as alternative isotopes of the same element, and in particular, the "C" part of the formula will produce 5 isotopes. With probabilities summing up to much more than 1...

Anyway, defining artificial elements in element_tables for the radiolabelling is, i think, the wrong way to do that. We can't hardcode radiolabellings in IsoSpec because there's just way too many ways to isotopically mark stuff, and even if we did, we need to hardcode that with radiolabelling probabilities - not every user will want the 95% radiolabelling efficiency, and that needs to be hardcoded too. Which means recompilation every time the user does a different radiolabel. I would really suggest following the radiolabelling.cpp example, that is, use custom elements without adding them to element_tables, and just use the raw Iso constructor (the non-formula one). I know it is kind of clunky, because that way you have to manually define all the elements, not just the custom ones, and also retrieve the appropriate values from element_tables yourself, for standard elements, but that way there's no recompilations involved. In the near future I will probably just define a += operator on Iso instances, that way you'll be able to do something like:

Iso glucose("C5H12O6"); // Define glucose without radiolabel
Iso radiolabel(1, {3}, {1}, {12, 13.00335, 14.0032}, {0.049, 0.0005, 0.95}); // Define the single 14C atom with 95% probability
glucose += radiolabel;
... // Do other computations on Iso
lopippo commented 5 years ago

Oh, great, thank you for reviewing my stuff. So now, I know that i'll try to figure out a graphical user interface that mimicks the "manual cpp ways". Sorry for bothering! Cheers, Filippo Feel free to close this issue (the issue of jonasfoe seemed to be fixed also)

michalsta commented 5 years ago

I've added also an example R code to radiolabelling. Now that we have exmaples for full set of C++/Python/R I'm going to go ahead and close this issue.

Note: we still might rework the API a bit to make this easier in near future.

lopippo commented 5 years ago

Just as a note, I could write a nice GUI for IsoSpec. I'll make a video and share it with you. IsoSpec is really impressive. Thank you for writing Free Software! Cheers, Filippo