corbinq / apex

Toolkit for QTL mapping and meta-analysis.
https://corbinq.github.io/apex/
16 stars 1 forks source link

apex cis breaks when given a BED file with only 1 trait #8

Closed hsun3163 closed 3 years ago

hsun3163 commented 3 years ago

Dear apex author,

I wonder what is controlling the cis-eqtl block? With an expression bed file containing 5 genes as followed,

Screen Shot 2021-07-20 at 12 36 16 PM

with this command:

apex cis --vcf /mnt/mfs/statgen/neuro-apex/ROSMAP-vcf/ROSMAP_chr19.vcf.gz \
--bed /mnt/mfs/statgen/neuro-apex/pipeline_testing/cache/test.chr19.mol_phe.bed.gz \
--cov /mnt/mfs/statgen/neuro-apex/pipeline_testing/cache/test.cov.gz \
--out /mnt/mfs/statgen/neuro-apex/pipeline_testing/cis/test.19 \
--theta /mnt/mfs/statgen/neuro-apex/pipeline_testing/cache/test.chr19.mol_phe.theta.gz 

The following message occurs, saying it have 2 cis eqtl block:

Using 48 threads.
19 present in both bcf and bed file.
976562 total variants on selected chromosomes.

Found 1196 samples in bcf file ... 
Found 664 samples in covariate file ... 
Found 665 samples in expression bed file ... 
Found 664 samples in common across all three files.

Processed data for 5 covariates across 664 samples.
Processed expression for 5 genes across 664 samples.
Processed genotype data for 976562 variants.

Using OLS (no GRM specified).
Started cis-QTL analysis ...
Scaling expression traits ... 
Calculating genotype-covariate covariance...
Calculating genotype residual variances ...Done.
Calculating expression residuals...
Scaling expression residuals ...
Processed 2 cis-QTL blocks out of 2 total

However, for the other expression file that contains only 1 gene, the following error occurs, saying it does not have any cis-eqtl block:

Using 48 threads.
20 present in both bcf and bed file.
914859 total variants on selected chromosomes.

Found 1196 samples in bcf file ... 
Found 664 samples in covariate file ... 
Found 665 samples in expression bed file ... 
Found 664 samples in common across all three files.

Processed data for 5 covariates across 664 samples.
Processed expression for 1 genes across 664 samples.
Processed genotype data for 914859 variants.

Using OLS (no GRM specified).
Started cis-QTL analysis ...
Scaling expression traits ... 
Calculating genotype-covariate covariance...
Calculating genotype residual variances ...Done.
Calculating expression residuals...
Scaling expression residuals ...
Processed 0 cis-QTL blocks out of 1 total
ERROR: 0; 0, 0; 782963, 33957

Killed.

I wonder what I can do to solve this?

corbinq commented 3 years ago

Hi,

Thanks for the bug report and interest in this.

It does indeed look like 'apex cis' (but not 'apex trans') fails if given a BED file with only 1 gene. I will make a note to either fix or add an error message accordingly. With 2 or more genes in the BED file 'apex cis' works as expected based on my testing.

In general, I advise against using a BED file that only contains a single gene. Instead, to restrict analysis to a region of interest, I would use --region flag, for example, --region 20:10000-15000.

"Blocks" are used internally; they determine how the data is chunked for loading and matrix multiplications. Their size affects computational performance, but not the actual output.

-Corbin

On Tue, Jul 20, 2021 at 12:40 PM hsun3163 @.***> wrote:

Dear apex author,

I wonder what is controlling the cis-eqtl block? With an expression bed file containing 5 genes as followed,

Screen Shot 2021-07-20 at 12 36 16 PM

The following message occurs, saying it have 2 cis eqtl block:

Using 48 threads. 19 present in both bcf and bed file. 976562 total variants on selected chromosomes.

Found 1196 samples in bcf file ... Found 664 samples in covariate file ... Found 665 samples in expression bed file ... Found 664 samples in common across all three files.

Processed data for 5 covariates across 664 samples. Processed expression for 5 genes across 664 samples. Processed genotype data for 976562 variants.

Using OLS (no GRM specified). Started cis-QTL analysis ... Scaling expression traits ... Calculating genotype-covariate covariance... Calculating genotype residual variances ...Done. Calculating expression residuals... Scaling expression residuals ... Processed 2 cis-QTL blocks out of 2 total

However, for the other expression file that contains only 1 gene, the following error occurs, saying it does not have any cis-eqtl block:

Using 48 threads. 20 present in both bcf and bed file. 914859 total variants on selected chromosomes.

Found 1196 samples in bcf file ... Found 664 samples in covariate file ... Found 665 samples in expression bed file ... Found 664 samples in common across all three files.

Processed data for 5 covariates across 664 samples. Processed expression for 1 genes across 664 samples. Processed genotype data for 914859 variants.

Using OLS (no GRM specified). Started cis-QTL analysis ... Scaling expression traits ... Calculating genotype-covariate covariance... Calculating genotype residual variances ...Done. Calculating expression residuals... Scaling expression residuals ... Processed 0 cis-QTL blocks out of 1 total ERROR: 0; 0, 0; 782963, 33957

Killed.

I wonder what I can do to solve this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/corbinq/apex/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWQAM4E4VSXHVE467WVQ2LTYWRILANCNFSM5AWFIHSA .

hsun3163 commented 3 years ago

Hi, Thanks for the bug report and interest in this. It does indeed look like 'apex cis' (but not 'apex trans') fails if given a BED file with only 1 gene. I will make a note to either fix or add an error message accordingly. With 2 or more genes in the BED file 'apex cis' works as expected based on my testing. In general, I advise against using a BED file that only contains a single gene. Instead, to restrict analysis to a region of interest, I would use --region flag, for example, --region 20:10000-15000. "Blocks" are used internally; they determine how the data is chunked for loading and matrix multiplications. Their size affects computational performance, but not the actual output.

Thank you, this is reassuring. Could you take a look at issue #7 as well?