Closed rlorigro closed 2 months ago
Hi Ryan,
I'm not sure I understand the issue. Do you intend to vary the parameter "global-failure-count"
to a larger number (>50) to see how it affect the physical phasing accuracy?
Best,
Hang
Sorry, somehow the line number was incorrect. This is the one I intended: https://github.com/PacificBiosciences/HiPhase/blob/9b01d7eda42a56dc9d20d39aa8b6fe23b60f5ab1/src/cli.rs#L150
1 read is a very loose criteria for joining variants into a phase block. I think we want something more conservative, so that there aren't as many switches upstream from Shapeit. Asking Shapeit to undo the errors is probably more complicated than fixing the errors upstream. Increasing the threshold will come at the cost of having shorter phase blocks, but i think it should be OK for Shapeit to handle many smaller ones. Given our low coverage, 2 might be a good place to start.
Sounds good to me! Will test in wdl and get back to you later!
A brief update to the result. I use "--min-spanning-reads 2" as an additional argument to hiphase and comparing the result with the baseline on the main branch. as expected, the switch error rate reduce, but the phase block number increase. Precision Recall and F1 basically remains the same. Not sure why, but the F1 score for the main branch is ~0.89, which is lower than the previous experiment. Here are the results:
baseline for filtered Small and SVs
PHASE_BLOCKS SWITCH_ERRORS FLIP_ERRORS NG_50 SWITCH_NGC50 SWITCHFLIP_NGC50
41 20 1 0 0 0
Experiment with --min-spanning_reads 2
PHASE_BLOCKS SWITCH_ERRORS FLIP_ERRORS NG_50 SWITCH_NGC50 SWITCHFLIP_NGC50
59 12 0 0 0 0
The runs are here: baseline: https://app.terra.bio/#workspaces/broad-firecloud-dsde/lrma-aou1-panel-creation-hprc-only/job_history/d5db385e-17e5-4010-992e-a4a5a5bfe5f0 experiment: https://app.terra.bio/#workspaces/broad-firecloud-dsde/lrma-aou1-panel-creation-hprc-only/job_history/999c7009-0bac-4803-9bc3-bbdb1d712523
Can you remind me how big the test region is? Whole genome? I took a look at the Terra runs but I didn't see anything
Also is there any effect on the incompatible alleles?
Can you remind me how big the test region is? Whole genome? I took a look at the Terra runs but I didn't see anything
Also is there any effect on the incompatible alleles?
The region is chr1:100Mb-110Mb. Here is the inconsistent metric:
baseline:
NUM_INCONSISTENT_ALLELES NUM_CONSISTENT_ALLELES NUM_INCONSISTENT_SITES NUM_CONSISTENT_SITES
33 11626 14 3851
experiment:
NUM_INCONSISTENT_ALLELES NUM_CONSISTENT_ALLELES NUM_INCONSISTENT_SITES NUM_CONSISTENT_SITES
34 11625 14 3851
ok thanks. interesting that it didnt change much
Using the parameter here: https://github.com/PacificBiosciences/HiPhase/blob/9b01d7eda42a56dc9d20d39aa8b6fe23b60f5ab1/src/cli.rs#L210
we can increase the threshold and get smaller, more confident phase blocks that we can send to Shapeit4
We would need to test how this affects the intra-phaseblock switch rate