Open XingerTang opened 6 months ago
I believe we already have output for recombination rate / probs per locus - or maybe we wanted to add one, but did not yet implement it?
@gregorgorjanc Yes, the output is not implemented. I will do that later.
@gregorgorjanc What are you thoughts about the update
@XingerTang I think the above update of the tasks makes sense!
I am now blanking on the connection between rec_prob
and rec_length
since I am not fully on board with the implementation. Let's see, rec_length
is given in Morgans for the whole chromosome representing the expected number of recombinations per chromosome. Say we have a 2 Morgan (=200 cM) long chromosome split into 4 segments delineated by 3 markers:
In the above case we need 4 values for rec_prob
, but is rec_prob
then 0.5 for each segment so that we get the expected number of recombinations equal to 2?
How does the current code go from rec_length
input to marker-to-marker rec_prob
?
@gregorgorjanc
The current code is calling the recombination rate transmissionRate
, and the real transmission rate is calculated as a part of the calculation of the update of segregation but is not explicitly named. The following is the part that calculates the transmissionRate
:
def setupTransmission(length, peelingInfo):
if peelingInfo.positions is None:
localMap = np.linspace(0, 1, num=peelingInfo.nLoci, dtype=np.float32)
else:
localMap = (
peelingInfo.positions / peelingInfo.positions[-1]
) # This should be sorted. Need to add in code to check.
for i in range(peelingInfo.nLoci - 1):
distance = localMap[i + 1] - localMap[i]
distance = distance * length
peelingInfo.transmissionRate[i] = distance
Here, length
refers to the recombination length input, transmissionRate
is the recombination rate. So we can provide the markers positions by using the peelingInfo.positions
, even though it is currently not being used and has no input option associated with it. And with this code, the transmissionRate
would indeed becomes [0.5, 0.5, 0.5, 0.5]
which sums up to 2
.
Maybe we need to rename some of the variables and add an input option for PeelingInfo.positions
before proceeding to the next step?
Transmission rate is indeed an odd name:(
Aren't positions read from the map file (if provided)?
@gregorgorjanc
Currently, the map file would be read only when we are doing hybrid peeling and a segregation file is provided as an input, then the information of the markers would be extracted while reading the map file, and passed as function arguments instead of being assigned to the PeelingInfo.positions
. So we cannot use it elsewhere.
So, maybe we can first make the map file input option always available and then read the information into the PeelingInfo.positions
to use the data at different places. Also, we can rename some variables such as transmissionRate
to improve the readability of the code.
OK, then please open an issue to provide map file for non-hybrid mode too. Thanks!
Do you see any reason why transmissionRate
term was used instead of rec...Rate
?
@gregorgorjanc Sure, I will open the issue.
I read the old code on the recomb branch, and there were both transmissionRate
and recomb
as the attributes of PeelingInfo
. From where the transmissionRate
is being used in the calculation of segregation (CollapsePointSeg
in Peeling.py
) I am sure it is just the recombination rate, whilerecomb
is used to store the value of the estimated recombination rate, but I didn't find anywhere in the code that recomb
is being used as a part of the calculation of segregation or any value relevant to the peeling. Since the old code is broken so I think maybe it is just an inconsistency in the naming.
Great that you are studying the code so much. Will email AW!
Let AlphaPeel take an input file of the recombination rate for each locus (possibly with sex information) which is stored as a numpy vector, and generate the corresponding output file.
PeelingInfo.transmissionRate
The test should have the following five subtests:
rec_prob
rec_length
no_input
rec_length
rec_length
rec_prob
rec_prob
both_same
rec_prob
(the more precise one)both_different