Closed PMSeitzer closed 2 years ago
I think we can cheat for the mean time and use boost's dhyper
and then rebuild if necessary.
I have some C# code for the binomial calculations, the rest should be hopefully not to onerous to generate the right functions without the need for boost (see the issue #2 )
@mammmals OK, that sounds good. In that case, I think we can add boost directly to the comet CMakeLists.txt
file, e.g.
find_package(BOOST 1.72.0 REQUIRED COMPONENTS regex)
if(Boost_FOUND)
message("Found Boost version 1.72.0")
INCLUDE_DIRECTORIES(${Boost_INCLUDE_DIR})
ADD_DEFINITIONS( "-DHAS_BOOST" )
endif()
we would swap out regex
and 1.72.0
for the specific components / version we need.
I suppose we will need something similar for Eigen? What do you think @PMSeitzer?
Hi @mammmals, yes it looks like Eigen can be handled the same way:
I wrote a hypergeometric dist/pmf/binomial into its own ccp/h that I think we can use to replace parts of boost's functions. It's in the issue-1 branch. Not sure yet what all we will need to replace but this would allow us to implement the pseudocode that Jon sent over at least.
@ColtoCaro could also be interested.
Code snippet:
#playing around with a dumbed down version of CIDS
#Assume no tunover, so o = e
#Assume every amino acid has an equal probability containing 1 (and only 1 modification)
#Then in the language of the paper we have
# number of modifications = h. number of amino acids in the peptide = m.
#The unconditional probability of observing h modifications
p_h = function(h, m){dbinom(h, m, 1/m)}
#The probability that b_i = b
p_b = function(b, i, m){dbinom(b, i, 1/m)}
#The probability that the number of modifications = h given Bi=b
# = the probability of Y_(m-i)(h - b)
p_y = function(b,i,h,m){dbinom(h - b, m - i, 1/m)}
#The probability of b mods, in the b_i ion conditional on h
p_bh <- function(b, i, h, m){
p_b(b, i, m) * p_y(b, i, h, m) / p_h(h, m)
}
h = 3
m = 10
i = 7
b = 7
#Sanity check
sum(p_bh(0:h, i, h, m))
i=1
plot(p_bh(0:h, i, h, m), ylim=c(0,1))
i = i + 1
#passed.
#Algebra shows that this reduces to a hypergeometric
#function for making a b_i or y_i ion distribution
#i is the b or y ion subscript.
#n is number of amino acids
#h is the number of modifications/isotopes observed
hyperFrag <- function(i, n, h){
dhyper(0:h, h, n-h, i)
}
plot(hyperFrag(i,m,h), ylim=c(0,1))
@mammmals, @ColtoCaro I'm happy to take this on for now - let me know if there is anything I should know or do before I get started working on it.
The R function dhyper()
is actually written in C. Here is the source code:
https://github.com/SurajGupta/r-source/blob/master/src/nmath/dhyper.c#L45-L75
Source code pulled specifically into RMath
components:
https://github.com/atks/Rmath/blob/master/dhyper.c#L45-L75
transport this work to #11, without using any of the Rmath
dependencies.
No need to continue this work, thanks to #11.
Something along the lines of adopting the following
R
snippet toC++
: