GrindeLab / STEAM

Significance Threshold Estimation for Admixture Mapping, an R package.
Other
7 stars 3 forks source link

running into errors: get_corr_chr #1

Closed complexgenome closed 5 years ago

complexgenome commented 5 years ago

Hi Devels,

Thank you for this library. I'm interested to use this package to estimate threshold for a three-way admixed population: afr amer, native, and European in hispanics. I've very limited knowledge of gds data and perhaps the reason for the error I've.

The haplotype per ancestry were derived using RFMIX, V1 and phased using SHAPEIT2. Also, I've per person global ancestry AFR, NAT and CEU.
Also, I've dataframe per chromosome that contains centi morgan information per input marker. I'd like to check the correlation for the ancestry proportion and inferred haplotypes for which I'm using get_corr_chr function.

I have gds file per CHR that consists ancestral dosage. That is for example, CHR22, has snp info, dosage_ceu, dosage_nat and dosage_afr

|--+ snp.chromosome   { Int32 5979, 23.4K }
|--+ dosage_ceu   { Int32 5979x10758, 245.4M }
|--+ dosage_nat   { Int32 5979x10758, 245.4M }
|--+ dosage_afr   { Int32 5979x10758, 245.4M }
|--+ sample.id   { Int32,factor 10758, 42.0K } *
|--+ snp.id   { Int32,factor 5979, 23.4K } *
\--+ snp.position   { Float64 5979, 46.7K }

using the above data, I use function as

tempgds <- openfn.gds(file.gdsdosage) #read gds and extract
afr_gds <- read.gdsn(index.gdsn(tempgds, "dosage_afr"))  
nat_gds <- read.gdsn(index.gdsn(tempgds, "dosage_nat")) 
ceu_gds <- read.gdsn(index.gdsn(tempgds, "dosage_ceu"))

test_CHR22<-get_corr_chr(chrom = 22, map = chr_cms, pop1.gds = ceu_gds, pop2.gds =nat_gds, pop3.gds = afr_gds)

After few minutes I get following error:

Error: is.character(gds.fn) is not TRUE In-house code for per ancestry gds per CHR can be found at link .

Can you please guide me to overcome this?

Thanks!

Versions of libraries I'm using:

[1] SNPRelate_1.12.2 argparse_2.0.1 GENESIS_2.8.1 [4] dplyr_0.8.0.1 gdsfmt_1.14.1 GWASTools_1.24.1 [7] Biobase_2.38.0 BiocGenerics_0.24.0 data.table_1.12.0 [10] STEAM_0.1.0 R version 3.4.2

kegrinde commented 5 years ago

Hello,

With the way that get_corr_chr is currently set up, pop1.gds, pop2.gds, and pop3.gds should be character strings specifying the name of three separate GDS files that contain the relevant local ancestry calls. So, rather than storing AFR, NAT, CEU dosages as separate nodes in a single GDS object, they should each be stored in their own SeqArray GDS files.

To get local ancestry calls in this format, you could first create three separate VCF files and then use the vcf2gds utility in the UW-GAC analysis pipeline.