dcgerard / updog

Flexible Genotyping of Polyploids using Next Generation Sequencing Data
https://dcgerard.github.io/updog/
24 stars 8 forks source link

Multi-SNP input for flexdog() #6

Closed dcgerard closed 4 years ago

dcgerard commented 4 years ago

Description

It would be nice to have a function that takes multiple SNPs as input. This would allow users with less programming experience to use flexdog() because they wouldn't have to write the for-loop themselves. Updog already Imports foreach and doParallel (because of mupdog()), so I could use those packages to provide parallelization.

data.frame Format

A possible format would be to require users to have SNPs in a data frame

id snp counts size
Xushu18 SNP1 298 354
Xushu18S1-001 SNP1 187 187
Xushu18S1-002 SNP1 201 201
Xushu18S1-003 SNP1 157 184
Xushu18S1-004 SNP1 175 215
Xushu18S1-005 SNP1 283 283

We could call this flexdog_df().

VariantAnnotation Format

We could also allow the user to insert a VariantAnnotation (https://doi.org/doi:10.18129/B9.bioc.VariantAnnotation) object. We could call this flexdog_va().

Output

The output should be a list-like format with two data.frames.

The first data.frame contains summary information for each SNP x Individual (posterior means, posterior modes, maximum posterior probabilities, posterior probabilities, id's, snp names, counts, size).

The second data.frame contains summary information for each SNP (bias, sequencing error rate, overdispersion parameter, log-likelihood, estimated proportion missing).