Hi Bryce,
I have two quick questions.
(1) made my own bed-file for step 5, which has unique SNP-target region pairs. But it contains many SNPs to a single target region, and a single SNP matches to many target region. I realized I should keep only unique target region for step9. Beyond this step, does this will affect any other following steps?
(2) In step 6 (extract_haplotype_read_counts), I found the outputs have lots of "NA"s in columns REGION.SNP.*.HAP.COUNT, and the REGION.READ.COUNT for those SNP ranges from 0 to > 10. Should I remove those SNP-target region?
See following example (you could see same SNPs appear multiple times in bold as what I said above 1).
================
CHROM TEST.SNP.POS TEST.SNP.ID TEST.SNP.REF.ALLELE TEST.SNP.ALT.ALLELE TEST.SNP.GENOTYPE TEST.SNP.HAPLOTYPE REGION.START REGION.END REGION.SNP.POS REGION.SNP.HET.PROB REGION.SNP.LINKAGE.PROB REGION.SNP.REF.HAP.COUNT REGION.SNP.ALT.HAP.COUNT REGION.SNP.OTHER.HAP.COUNT REGION.READ.COUNT GENOMEWIDE.READ.COUNT
chr1 10177 b'rs367896724' b'A' b'AC' 0.00 1|0 29554 31109 NA NA NA NA NA NA 2 3708217
chr1 11008 b'rs575272151' b'C' b'G' 0.05 0|0 29554 31109 NA NA NA NA NA NA 3 3708217
chr1 13116 b'rs62635286' b'T' b'G' 0.10 0|0 29554 31109 NA NA NA NA NA NA 15 3708217
chr1 13118 b'rs200579949' b'A' b'G' 0.08 0|0 29554 31109 NA NA NA NA NA NA 0 3708217
chr1 49298 b'rs200943160' b'T' b'C' 0.00 1|1 89295 133566 125271;129010 0.33;0.00 1.00;1.00 0;0 0;0 0;0 15 3708217
chr1 63268 b'rs75478250' b'T' b'C' 0.01 0|1 89295 133566 125271;129010 0.33;0.00 1.00;1.00 0;0 0;0 0;0 15 3708217
chr1 63268 b'rs75478250' b'T' b'C' 0.01 0|1 29554 31109 NA NA NA NA NA NA 2 3708217
chr1 63671 b'rs80011619' b'G' b'A' 0.01 0|0 89295 133566 125271;129010 0.33;0.00 1.00;1.00 0;0 0;0 0;0 15 3708217
chr1 63671 b'rs80011619' b'G' b'A' 0.01 0|0 29554 31109 NA NA NA NA NA NA 2 3708217
chr1 63735 b'rs201888535' b'CCTA' b'C' 0.00 1|1 29554 31109 NA NA NA NA NA NA 2 3708217
chr1 63735 b'rs201888535' b'CCTA' b'C' 0.00 1|1 89295 133566 125271;129010 0.33;0.00 1.00;1.00 0;0 0;0 0;0 15 3708217
Hi Bryce, I have two quick questions. (1) made my own bed-file for step 5, which has unique SNP-target region pairs. But it contains many SNPs to a single target region, and a single SNP matches to many target region. I realized I should keep only unique target region for step9. Beyond this step, does this will affect any other following steps? (2) In step 6 (extract_haplotype_read_counts), I found the outputs have lots of "NA"s in columns REGION.SNP.*.HAP.COUNT, and the REGION.READ.COUNT for those SNP ranges from 0 to > 10. Should I remove those SNP-target region?
See following example (you could see same SNPs appear multiple times in bold as what I said above 1).
================ CHROM TEST.SNP.POS TEST.SNP.ID TEST.SNP.REF.ALLELE TEST.SNP.ALT.ALLELE TEST.SNP.GENOTYPE TEST.SNP.HAPLOTYPE REGION.START REGION.END REGION.SNP.POS REGION.SNP.HET.PROB REGION.SNP.LINKAGE.PROB REGION.SNP.REF.HAP.COUNT REGION.SNP.ALT.HAP.COUNT REGION.SNP.OTHER.HAP.COUNT REGION.READ.COUNT GENOMEWIDE.READ.COUNT chr1 10177 b'rs367896724' b'A' b'AC' 0.00 1|0 29554 31109 NA NA NA NA NA NA 2 3708217 chr1 11008 b'rs575272151' b'C' b'G' 0.05 0|0 29554 31109 NA NA NA NA NA NA 3 3708217 chr1 13116 b'rs62635286' b'T' b'G' 0.10 0|0 29554 31109 NA NA NA NA NA NA 15 3708217 chr1 13118 b'rs200579949' b'A' b'G' 0.08 0|0 29554 31109 NA NA NA NA NA NA 0 3708217 chr1 49298 b'rs200943160' b'T' b'C' 0.00 1|1 89295 133566 125271;129010 0.33;0.00 1.00;1.00 0;0 0;0 0;0 15 3708217 chr1 63268 b'rs75478250' b'T' b'C' 0.01 0|1 89295 133566 125271;129010 0.33;0.00 1.00;1.00 0;0 0;0 0;0 15 3708217 chr1 63268 b'rs75478250' b'T' b'C' 0.01 0|1 29554 31109 NA NA NA NA NA NA 2 3708217 chr1 63671 b'rs80011619' b'G' b'A' 0.01 0|0 89295 133566 125271;129010 0.33;0.00 1.00;1.00 0;0 0;0 0;0 15 3708217 chr1 63671 b'rs80011619' b'G' b'A' 0.01 0|0 29554 31109 NA NA NA NA NA NA 2 3708217 chr1 63735 b'rs201888535' b'CCTA' b'C' 0.00 1|1 29554 31109 NA NA NA NA NA NA 2 3708217 chr1 63735 b'rs201888535' b'CCTA' b'C' 0.00 1|1 89295 133566 125271;129010 0.33;0.00 1.00;1.00 0;0 0;0 0;0 15 3708217
Thanks a lot. Best, Liuyang