e-jorsboe / fastNGSadmix

Program for estimating admixture proportions and doing principal component analysis of a single NGS sample
GNU General Public License v3.0
9 stars 4 forks source link

Segmentation fault #3

Closed PhilPalmer closed 5 years ago

PhilPalmer commented 5 years ago

Hi,

Thanks for the tool

I am trying to get subpopulation proportions data for an individual 23andMe file.

I have converted my 23andMe file to plink format using 23andme-to-plink and ran plink to generate the relevant bed, bim, fam, map & ped files.

I have also built fastNGSadmix (as per the instructions) and downloaded the 1000 genomes reference panel from here

However, when I run the following:

./fastNGSadmix/fastNGSadmix -plink 23andMe/uk35C650_20170608144013 -fname data/1000genomes/data1000genomes/refPanel_1000genomesRefPanel.txt -Nname data/1000genomes/data1000genomes/nInd_1000genomesRefPanel.txt -out out -whichPops all

I get this & the log file is empty:

    -> Dumping file: out.log
Input: -likes (null) -plink 23andMe/uk35C650_20170608144013 -Nname data/1000genomes/data1000genomes/nInd_1000genomesRefPanel.txt -fname data/1000genomes/data1000genomes/refPanel_1000genomesRefPanel.txt -out out -whichPops all
Setup: -seed 1558084052 -method 1
Ploidy of 2 has been chosen

The accelerated EM has been chosen
The adjusted method has been chosen
Convergence: -maxIter 2000 -tol 0.00000010
The following number of bootstraps have been chosen: 0
Segmentation fault

Any ideas what the problem may be?

Thanks in advance, any help would be much appreciated

e-jorsboe commented 5 years ago

Hi,

Thanks for the kind words.

So as far as I can see something goes wrong when fastNGSadmix tries reading in the plink file.

Can you give me a head of the .fam and .bim file? It has to be a .bed file with .bim and .fam file.

The program plink can read your plink file just fine?

PhilPalmer commented 5 years ago

Hi @e-jorsboe,

Thank you for the prompt response.

.fam is just one line (which is likely part of the problem):

uk35C650_20170608144013_FAM     uk35C650_20170608144013 uk35C650_20170608144013_FATHER  uk35C650_20170608144013_MOTHER  0       -9

.bim:

1       rs12564807      0       734462  0       A
1       rs3131972       0       752721  0       G
1       rs148828841     0       760998  0       C
1       rs12124819      0       776546  G       A
1       rs115093905     0       787173  T       G
1       rs11240777      0       798959  0       G
1       rs7538305       0       824398  0       A
1       rs4970383       0       838555  0       C

I think plink should be able to read the files fine because it was plink that generated them.

I have also generated a VCF file from the 23andMe file & then tried converting it to beagle-gl format. However, I was unable to because the VCF lacked the GT & PL fields in the FORMAT column.

Do you think this could also be why I get the segmentation fault?

Thanks again

e-jorsboe commented 5 years ago

Hi,

Perhaps try and remove the paternal and maternal IDs from the .fam file (3rd and 4th column). Or try and give the first 4 columns of the .fam file shorter ID names.

fastNGSadmix should definitely be able to handle a .fam file with only 1 line.

e-jorsboe commented 5 years ago

Hi again,

So the issue was how many characters the program would read in pr. line of the .fam file. Before it only read in 99 characters pr. line. And since your line in the .fam file had more characters than that, the phenotype and the gender column had not been read and then when it tried to copy these values into memory the program would fail.

I have now increased how many characters the program reads in pr. line and this should have fixed your problem. So try and download and compile fastNGSadmix again, and this time it should work!

PhilPalmer commented 5 years ago

Hi,

Awesome, thank you for fixing this so quickly. It's very much appreciated.

I have recompiled fastNGSadmix again and it now works 😃

Side note for future googlers: I also had a problem with duplicate sites in the plink binary file(s) but this was easy to fix with plink:

plink --file ${name} --list-duplicate-vars ids-only suppress-first
plink --file ${name} --recode -exclude plink.dupvar --out ${name}
plink --file ${name} --out ${name}

Thanks again for a great tool