frankvogt / vcf2gwas

Python API for comprehensive GWAS analysis using GEMMA
GNU General Public License v3.0
84 stars 29 forks source link

BSLMM is giving error with same data passed in LMM #6

Closed vinod1981 closed 2 years ago

vinod1981 commented 2 years ago

Hi Frank, I want to run both LMM and BSLMM, LMM is as usual producing results without any error but with same dataset, BSLMM is failing. Do I need to assign more memory to BSLMM as compare to LMM or there could be some other issues. The command I am using:

/vol/cluster-data/vkumar/miniconda3/envs/myenv/bin/vcf2gwas -v /prj/pflaphy-robot/genotypeGVCF_allsamples_updated/linkimpute/output_all_imputed_final_chr7Mod_js_GT.vcf.gz -pf /prj/pflaphy-robot/genotypeGVCF_allsamples_updated/linkimpute/vcf2gwas/soil_pH/soil_pH_GWAS_updated.csv -ap -bslmm -q 0.01 -o /prj/pflaphy-robot/genotypeGVCF_allsamples_updated/linkimpute/vcf2gwas/soil_pH/bslmm/

Traceback (most recent call last):

File "/vol/cluster-data/vkumar/miniconda3/envs/myenv/lib/python3.9/site-packages/vcf2gwas/analysis.py", line 360, in Post_analysis.run_postprocessing(top_ten, Log, model, n, prefix2, path, n_top, i, sigval, nolabel) File "/vol/cluster-data/vkumar/miniconda3/envs/myenv/lib/python3.9/site-packages/vcf2gwas/utils.py", line 1307, in run_postprocessing Bslmm.format_col(prefix2, path) File "/vol/cluster-data/vkumar/miniconda3/envs/myenv/lib/python3.9/site-packages/vcf2gwas/utils.py", line 1390, in format_col with open(os.path.join(path, f'{prefix}.hyp.txt')) as f: FileNotFoundError: [Errno 2] No such file or directory: '/prj/pflaphy-robot/genotypeGVCF_allsamples_updated/linkimpute/vcf2gwas/soil_pH/bslmm/output/bslmm/Soil_pH/Soil_pH_mod_sub_soil_pH_GWAS_updated_output_all_imputed_final_chr7Mod_js_GT.hyp.txt' Traceback (most recent call last): File "/vol/cluster-data/vkumar/miniconda3/envs/myenv/lib/python3.9/site-packages/vcf2gwas/starter.py", line 445, in filenames = Post_analysis.summarizer(path3, path2, pc_prefix3, snp_prefix, n_top, Log, prefix_list) File "/vol/cluster-data/vkumar/miniconda3/envs/myenv/lib/python3.9/site-packages/vcf2gwas/utils.py", line 1021, in summarizer for file in os.listdir(path): FileNotFoundError: [Errno 2] No such file or directory: '/prj/pflaphy-robot/genotypeGVCF_allsamples_updated/linkimpute/vcf2gwas/soil_pH/bslmm/output/bslmm/summary/top_SNPs'

What could be the reason?

Vinod,

frankvogt commented 2 years ago

Hi Vinod, Could you please update with "conda update vcf2gwas -c conda-forge -c bioconda -c fvogt257" and run the analysis again? Because amongst other things, the error messaging was improved in the latest version.

vinod1981 commented 2 years ago

Hi Frank, Thanks for the reply. I updated vcf2gwas and it is working now but the problem is that it is aborting at one point without giving any error or warning. What could be the reason? It looks like the program is dumped and a core file is generated in the qsub location. It is aborting at this point (below pasted output) after running for a long time. It looks a memory issue as I have around 2.7m SNPs and 800 Individuals so the matrix looks big which is stored in the physical space but I am waiting for your answer before thinning my data. The stacktrace is here:

Parsing arguments..
Genotype file: output_all_imputed_final_chr7Mod_js_GT.vcf.gz
Phenotype file(s): soil_pH_GWAS_updated.csv
Arguments parsed successfully

Preparing files

Checking soil_pH_GWAS_updated.csv..

Indexing VCF file..
VCF file successfully indexed (Duration: 44.1 seconds)

Filtering SNPs..
SNPs successfully filtered (Duration: 5 minutes, 20.0 seconds)

File preparations completed

Starting analysis..

Beginning with analysis of soil_pH_GWAS_updated.csv

Preparing files

Checking and adjusting files..
Chromosomes: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8
Checking individuals in VCF file..
Checking individuals in phenotype file..
Not all individuals in phenotype and genotype file match
Removed 0 out of 804 individuals, 804 remaining
No covariate file specified
In total, removed 31 out of 835 individuals, 804 remaining
Files successfully adjusted

Filtering and converting files

Converting to PLINK BED..
Successfully converted to PLINK BED (Duration: 53.8 seconds)

Adding phenotypes/covariates to .fam file

Editing .fam file..
All phenotypes chosen
Phenotype(s) added to .fam file
Editing .fam file successful

Initialising GEMMA

Running GEMMA

Phenotypes to analyze: Soil_pH

GEMMA 0.98.3 (2020-11-28) by Xiang Zhou and team (C) 2012-2020
Reading Files ... 
number of total individuals = 804
number of analyzed individuals = 804
number of covariates = 1
number of phenotypes = 1
number of total SNPs/var        =  2730661
number of analyzed SNPs         =  2721950
Start Eigen-Decomposition...
pve estimate =0.999963
se(pve) =0.000497586
Calculating UtX...
Starting with Soil_pH analysis..
Output will be saved in /prj/pflaphy-robot/genotypeGVCF_allsamples_updated/linkimpute/vcf2gwas/soil_pH/bslmm/output/bslmm/Soil_pH/Soil_pH_20210908_2056/
Calculating bayesian sparse linear mixed model..

vcf2gwas v0.6.7 

Initialising..

Do I need to describe the chain parameters somewhere? And there are no results, all the directories are empty. Is it a problem of memory allocation or something else? Thanks,

Vinod,

frankvogt commented 2 years ago

Hi Vinod, The reason why GEMMA aborts is highly likely to be indeed memory-related, so thinning out the data and/or assigning more memory will probably the best solutions. I am also working on implementing options to change the chain parameters manually, so that the burn-in steps and sampling steps can be adjusted.

I hope that helps, Frank

vinod1981 commented 2 years ago

Hi Frank, Thanks, I did thinning and it worked successfully. Vinod,