dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

Not including minmap in PCA/Structure #463

Closed alexkrohn closed 2 years ago

alexkrohn commented 2 years ago

Is there a way to not use the minmap filtering threshold in the PCA and Structure analyses included with ipyrad? I really like the imputation functions in your analysis tools, so I want to use them. However, I have some individuals that have high amounts of missing data and are one of 1-2 individuals in a population, thus they cause a lot of SNPs to be filtered at the minmap step. Is it possible to filter only with mincov?

I've tried setting minmap to 0 or NaN, and not including minmap in the PCA call, but both yield errors. Any thoughts?

isaacovercast commented 2 years ago

There are only 2 imputation methods at the moment "sample" and "kmeans", both of these use information about allele frequencies within groups, so setting setting mincov, which is global` would break their function, so I'm afraid there's not a way to do what you want. Unless I don't understand well.

Since this is more of a question about operation, rather than a bug or feature request it would be better to put these kinds of things in the gitter channel in the future: https://gitter.im/dereneaton/ipyrad. You might check out a conversation I had there within the last week with @bmichanderson who wanted to do something remarkably similar to what you are trying to do. And I had some thoughts for him there that you might find useful.

alexkrohn commented 2 years ago

Got it. I didn't know about the gitter channel -- I will go there next time. Thanks.

I agree with your thoughts that a low mincov might be dangerous. In order to get around the minmap argument, though, it sounds like I'd just have to lump the single-individual populations into another population, or remove them.