corbinq / apex

Toolkit for QTL mapping and meta-analysis.
https://corbinq.github.io/apex/
16 stars 1 forks source link

Segmentation default #12

Open chrisclarkson opened 2 years ago

chrisclarkson commented 2 years ago

I downloaded the precompiled binary of this program and tried to run it on my own data- which look as follows:

BED:

#chr    start   end     gene_name       00482428        006683
chr1    713361  715120  peak1   68      69      108     64
chr1    761305  763147  peak2   53      68      52      50
chr1    805156  805842  peak3   39      25      31      27

... PCA (made using QTLtools):

#ID     00482428        00668310        01243685
PC1     87.3077 115.374 -57.2281        49.5875 
PC2     29.0207 23.6788 23.5525 0.452292        
PC3     13.237  -35.749 9.54686 16.9021 -60.1183

...

VCF (bgzipped and tabix indexed)

##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this location by REF and ALT
##FILTER=<ID=LowQual,Description="Low quality">
##FILTER=<ID=PASS,Description="All filters passed">
...
##source=ApplyVQSR
##source=GenomicsDBImport
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  00482428....

I get an error when I try to run a cis MR on this:

~/apex/bin/apex cis --vcf AD_SNP.recalibrated_99.9.vcf.gz --bed AD_apex.bed.gz --cov AD_apex.pca --prefix AD_apex_out
Using 1 threads.
 present in both bcf and bed file.
0 total variants on selected chromosomes.

Found 669 samples in bcf file ... 
Found 669 samples in covariate file ... 
Found 669 samples in expression bed file ... 
Found 669 samples in common across all three files.

Processed data for 669 covariates across 669 samples.
Processed expression for 26383 genes across 669 samples.
Processed genotype data for 0Segmentation fault (core dumped)

Is it just the case that the software is not compatible with my system?

jdblischak commented 2 years ago

@chrisclarkson The most concerning thing I see is that it identified 0 variants:

 present in both bcf and bed file.
0 total variants on selected chromosomes.

This is likely why it later failed when processing the genotype data:

Processed genotype data for 0Segmentation fault (core dumped)

Your BED file uses chromosome names with the "chr" prefix, e.g. "chr1". What chromosome names are used in your VCF file? If they don't match, that might be the problem.

jdblischak commented 2 years ago

For context, here is what I see when attempted to run apex store. It identified variants and shared samples, but then segfaulted:

bin/apex/apex store \
  --vcf data/genotypes/genotypes.chr22.vcf.gz \
  --bed data/expression/adipose_subcutaneous.bed.gz \
  --cov data/covariates/covariates.txt.gz \
  --prefix test-apex

Using 2 threads.
chr22 present in both bcf and bed file.
633161 total variants on selected chromosomes.

Found 838 samples in bcf file ...
Found 581 samples in covariate file ...
Found 663 samples in expression bed file ...
Found 581 samples in common across all files.

Segmentation fault
jdblischak commented 2 years ago

Quick update. From Chris' output, I guessed that apex store failed while processing the covariates file. Thus I focused my attention there, and confirmed it was not formatted correctly. Once I fixed that, now it's running!