dipetkov / eems

Estimating Effective Migration Surfaces
GNU General Public License v2.0
102 stars 28 forks source link

bed2diff error: [Data::getsize] Error opening plink files #15

Closed SoniaAndrade closed 6 years ago

SoniaAndrade commented 6 years ago

Dear Dr Petkova,

Using bed2diffs on my Plink files (.bed, .fam and .bim, all at the same directory), I got the following message:

./bed2diffsv1 --bfile palma2 --nthreads 4 Compute the average genetic differences according to: Dij = (1/|Mij|) sum{m in Mij} (z{im} - z{jm})^2 where Mij is the set of SNPs where both i and j are called

[Data::getsize] Error opening plink files palma2.[bed/bim/fam]

Please see the input files attached. I am wondering what might be wrong with the dataset, which is extensive (it's a product from GBS data) but seems alright. Could you please help me with that? Thanks test.zip

dipetkov commented 6 years ago

Hello

Thank you for attaching your plink dataset -- it was very helpful in figuring out what the issue is:

The bim file contains unrecognized chromosome codes: e.g. "scaffold_3" instead of "1". Plink can process this dataset if we specify the --allow-extra-chr option. However, (it doesn't seem that) libplinkio can handle non-standard formatting.

A simple way to deal with this is to modify a copy of the bim file. For example, on the command line we can "assing" all SNPs to chromosome 1 for the purpose of computing genetic dissimilarities with bed2diffs:

cp datapath.bed newdatapath.bed
cp datapath.fam newdatapath.fam
awk '{print 1,$2,$3,$4,$5,$6}' datapath.bim  > newdatapath.bim

Hopefully,

./bed2diffs_v1 --bfile newdatapath

will execute without errors.

Thanks for raising this issue. I've added a comment to the bed2diffs error message to check that the plink dataset has a standard format if there is a libplinkio error.

SoniaAndrade commented 6 years ago

Thanks, it worked perfectly! I had to use scaffold_ as plink cannot process more than ~90 chromosomes and I am dealing with non model organism GBS data. Thanks again!