Closed jflot closed 4 years ago
Run this script here https://github.com/quant42/Scripts/blob/master/bioinf/assumeDiploid.py first Usage: python assumeDiploid.py in.fa out.fa
It is nice, but many potential HaplowebMaker users do not know how to use the command line, hence the script should be incorporated in the pipeline with a simple box to tick in the "Advanced Options" menu.
When an individual has two identical sequences in the input FASTA file, under the default option that does not assume diploidy this sequence should be counted only one for this individual, not twice (since the default is to count presence/absence of individuals). This is important because some phasing pipelines output by default two sequences per individuals, even if the individual is homozygous. Also, it could happen that the two sequences of one individual differ only at positions that become masked because another individual has missing data in these columns: in that case, the two sequences will become a single haplotype, and this single haplotype should be counted only one in the default behaviour (but twice if the box "Count homozygous haplotypes twice" is ticked).
So we need four options here: 1) all circles shown with the same size 2) circle area represents the number of individuals harboring a given haplotype; 3) circle area represents the number of times a given sequence is found in the alignments; 4) circle area represents the inferred frequency of the haplotype in the population (i.e., counting each homozygous individual twice)
only 4. remains to be done
When choosing option 1, all the connecting curves should also be drawn with the same thickness (total thickness to be split between the different colors in case of multi-colored connections). Also in such case the portions of the pie charts should have equal sizes.
Add a setting allowing users to choose whether the circles' diameters should be based on the number of individuals possessing the corresponding haplotypes (presence/absence) or on the inferred frequency of this haplotype in the dataset (i.e., counting homozygous individuals twice).