e-jorsboe / fastNGSadmix

Program for estimating admixture proportions and doing principal component analysis of a single NGS sample
GNU General Public License v3.0
9 stars 4 forks source link

plinkToRef.R "attempt to select less than one element in integerOneIndex" #4

Closed mikyatope closed 5 years ago

mikyatope commented 5 years ago

Hi,

I'm trying to create a ref panel with 1000genomes phase3 data (from plink2 website). I'm testing the procedure with chr1 only.

I generated bam, bim and fam files using:

./plink2 --pgen chr1_phase3.pgen --pvar chr1_phase3.pvar -psam phase3_corrected.psam --make-bed --out 1K_test_chr1 --maf 0.05 --geno 0.05 --max-alleles 2 --rm-dup exclude-all

But when I try to execute plinkToReg.R I get this error:

Error in l[[pop]] <- 1 - colSums(y, na.rm = T)/(colSums(!is.na(y)) * 2) : attempt to select less than one element in integerOneIndex

Any suggestion on what could be happening? Thanks!

e-jorsboe commented 5 years ago

Hi,

Does your .fam file have a group ID in the first column of the .fam file. This script will calculate the frequencies of each marker for each population.

Otherwise you can see more about the reference panel on the wiki:

http://www.popgen.dk/software/index.php/FastNGSadmix#Making_a_reference_panel

e-jorsboe commented 5 years ago

Hi again,

May I close this issue?

mikyatope commented 5 years ago

Hi,

Yes, thank you very much, indeed it was a wrong conversion from psam to fam in plink from my part and not an fastNGS issue

mikyatope commented 5 years ago

I'd like to reopen, with another error in plinkToRef.R, just for the sake of not creating a new issue if you don't mind.

I got to create a reference dataset from 1000genomes_phase3 data, for chr1 only, no problems so far. Trying to replicate the very same steps with all the chromosomes from 1000genomes, I get this error with plinkToRef.R

Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
  duplicate 'row.names' are not allowed
Calls: <Anonymous> ... rownames<- -> row.names<- -> row.names<-.data.frame
In addition: Warning message:
non-unique value when setting 'row.names': '.'
Execution halted

Like the previous error, seems it could be an issue with how I'm generating the files with plink, but I'm at a loss on where to look at, do you have any suggestion?

My current plink filters are:

--maf 0.05 --geno 0.05 --max-alleles 2 --rm-dup exclude-all --snps-only --chr 1-22

thank you kindly and sorry for any inconvenience

e-jorsboe commented 5 years ago

Hi,

It seems like you either have duplicate IDs for your individuals in the .fam file (the second column should contain unique IDs). Or duplicate sites in your .bim file.

Also I think I would recommend you to do one chromosome at a time, I am not sure snpStats (the package that is used for reading in plink files) can read that big plink files into R.

There is an already made version of the 1000 genomes reference panel: http://www.popgen.dk/software/index.php/FastNGSadmix#Quick_start_-_1000_genomes_reference_panel

e-jorsboe commented 5 years ago

Can I close this issue?