Rosemeis / pcangsd

Framework for analyzing low depth NGS data in heterogeneous populations using PCA.
GNU General Public License v3.0
46 stars 11 forks source link

request: add SNP locus metadata to SNP weights output? #76

Open TeresaPegan opened 1 year ago

TeresaPegan commented 1 year ago

Hello! I would like to request, if it's feasible at some point, for SNP loci (chromosome+position) to be added to the output of the --snp-weights argument.

I know that the weights in the output are in the same order as the SNPs in the input beagle, and in a lot of cases I can simply match the SNPs with their weights that way.

However, the lack of metadata on the SNP weights has become a big problem for me during an analysis where I am using a beagle file with several individuals removed post-hoc. I find that the output of --snp-weights contains a few thousand fewer rows than there are loci in my beagle file when I do this. I suspect this is because removing some individuals resulted in loci that are monomorphic in the reduced individual set even though they are polymorphic in the full set. I assume that PCANGSD is ignoring these invariant loci based on some kind of MAF filter, and is therefore not producing a weight for these rows.

Because I do not know which rows PCANGSD is ignoring, I cannot match up the SNP weights with the SNPs in my beagle file just by using their indices.

Alternatively, maybe PCANGSD could include an "NA" in the --snp_weights output for rows that it ignores. This would allow the weights to be matched back to the SNPs by index even if locus metadata were not included in the --snp_weights output.

Thanks! I otherwise find this snp-weights feature to be extremely useful! -Teresa

Tannervanorden commented 9 months ago

I am having this exact same problem. It would be really awesome if names could be paired with SNP weights or if NAs could be added. Thanks so much! Tanner