SchweppeLab / CometCIDS

MIT License
1 stars 0 forks source link

Create Peptide Fragmentation Model Using hyperfrag #1

Closed PMSeitzer closed 2 years ago

PMSeitzer commented 3 years ago

Something along the lines of adopting the following R snippet to C++:

#function for making a b_i or y_i ion distribution
#i is the b or y ion subscript.  
#n is number of amino acids
#h is the number of modifications/isotopes observed
hyperFrag <- function(i, n, h){
  dhyper(0:h, h, n-h, i)
}
mammmals commented 3 years ago

I think we can cheat for the mean time and use boost's dhyper and then rebuild if necessary. I have some C# code for the binomial calculations, the rest should be hopefully not to onerous to generate the right functions without the need for boost (see the issue #2 )

PMSeitzer commented 3 years ago

@mammmals OK, that sounds good. In that case, I think we can add boost directly to the comet CMakeLists.txt file, e.g.

find_package(BOOST 1.72.0 REQUIRED COMPONENTS regex)
if(Boost_FOUND)
    message("Found Boost version 1.72.0")
    INCLUDE_DIRECTORIES(${Boost_INCLUDE_DIR})
    ADD_DEFINITIONS( "-DHAS_BOOST" )
endif()

we would swap out regex and 1.72.0 for the specific components / version we need.

mammmals commented 3 years ago

I suppose we will need something similar for Eigen? What do you think @PMSeitzer?

PMSeitzer commented 3 years ago

Hi @mammmals, yes it looks like Eigen can be handled the same way:

https://eigen.tuxfamily.org/dox/TopicCMakeGuide.html

mammmals commented 3 years ago

I wrote a hypergeometric dist/pmf/binomial into its own ccp/h that I think we can use to replace parts of boost's functions. It's in the issue-1 branch. Not sure yet what all we will need to replace but this would allow us to implement the pseudocode that Jon sent over at least.

PMSeitzer commented 2 years ago

@ColtoCaro could also be interested.

PMSeitzer commented 2 years ago

Code snippet:

#playing around with a dumbed down version of CIDS

#Assume no tunover, so o = e
#Assume every amino acid has an equal probability containing 1 (and only 1 modification)
#Then in the language of the paper we have

# number of modifications = h.  number of amino acids in the peptide = m.  

#The unconditional probability of observing h modifications
p_h = function(h, m){dbinom(h, m, 1/m)}

#The probability that b_i = b
p_b = function(b, i, m){dbinom(b, i, 1/m)}

#The probability that the number of modifications = h given Bi=b
# = the probability of Y_(m-i)(h - b)
p_y = function(b,i,h,m){dbinom(h - b, m - i, 1/m)}

#The probability of b mods, in the b_i ion conditional on h
p_bh <- function(b, i, h, m){
  p_b(b, i, m) * p_y(b, i, h, m) / p_h(h, m)
}

h = 3
m = 10
i = 7
b = 7
#Sanity check
sum(p_bh(0:h, i, h, m))
i=1
plot(p_bh(0:h, i, h, m), ylim=c(0,1))
i = i + 1
#passed.

#Algebra shows that this reduces to a hypergeometric

#function for making a b_i or y_i ion distribution
#i is the b or y ion subscript.  
#n is number of amino acids
#h is the number of modifications/isotopes observed
hyperFrag <- function(i, n, h){
  dhyper(0:h, h, n-h, i)
}

plot(hyperFrag(i,m,h), ylim=c(0,1))
PMSeitzer commented 2 years ago

@mammmals, @ColtoCaro I'm happy to take this on for now - let me know if there is anything I should know or do before I get started working on it.

PMSeitzer commented 2 years ago

The R function dhyper() is actually written in C. Here is the source code:

https://github.com/SurajGupta/r-source/blob/master/src/nmath/dhyper.c#L45-L75

PMSeitzer commented 2 years ago

Source code pulled specifically into RMath components: https://github.com/atks/Rmath/blob/master/dhyper.c#L45-L75

PMSeitzer commented 2 years ago

transport this work to #11, without using any of the Rmath dependencies.

PMSeitzer commented 2 years ago

No need to continue this work, thanks to #11.