KalinNonchev / gnomAD_DB

This package scales the huge gnomAD files to a SQLite database, which is easy and fast to query. It extracts from a gnomAD vcf the minor allele frequency for each variant.
MIT License
35 stars 10 forks source link

Partial output when recreating database #20

Closed pkiehl2002 closed 1 year ago

pkiehl2002 commented 1 year ago

Hello, When recreating the SQL database, the Snakemake program does not output a database to the location specified in script_config.yaml. It produces createTSVtables.ipynb but not insertVariants.ipynb in test_out. Any suggestions on how to fix this? Thank you!

KalinNonchev commented 1 year ago

Hello @pkiehl2002, If you are using linux, could you navigate to "gnomAD_DB" folder and send the output of the following cmd in terminal (or similar one to list all files in the current directory)

find .
pkiehl2002 commented 1 year ago

For reference, I specified the database_location as databaseOut and gnomad_vcf_location as vcfLoc in script_config.yaml. Here is the output: . ./snakemake2.out ./scripts ./scripts/README.md ./scripts/GettingStartedwithGnomAD_DB.py ./scripts/createTSVtables.py ./scripts/GettingStartedwithGnomAD_DB.ipynb ./scripts/createTSVtables.ipynb ./scripts/insertVariants.ipynb ./scripts/insertVariants.py ./scripts/download_vcf_gnomad.sh ./README.md ./snakemake3.out ./Snakefile ./databaseOut ./environment.yaml ./GettingStartedwithGnomAD_DB.ipynb ./.circleci ./.circleci/config.yml ./test_dir ./test_dir/test_gnomad_db.py ./test_dir/.gitkeep ./setup.cfg ./test_out ./test_out/gnomad.genomes.v3.1.2.sites.chr1.tsv.gz ./test_out/scripts ./test_out/scripts/createTSVtables.ipynb ./test_out/gnomad.genomes.v3.1.2.sites.chr14.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr5.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr18.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr15.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr3.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr7.tsv.gz ./test_out/.gitkeep ./test_out/gnomad.genomes.v3.1.2.sites.chrX.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr21.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr13.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr17.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr10.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr6.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr8.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr11.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr2.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr4.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr12.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr16.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chrY.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr20.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr9.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr22.tsv.gz ./test_out/gnomad.genomes.v3.1.2.sites.chr19.tsv.gz ./requirements.txt ./gnomad_db ./gnomad_db/utils.py ./gnomad_db/database.py ./gnomad_db/pkgdata ./gnomad_db/pkgdata/gnomad_columns.yaml ./gnomad_db/init.py ./data ./data/test_vcf_gnomad_chr21_10000.tsv.gz ./data/test_vcf_gnomad_chr21_10000.vcf.bgz ./.gitignore ./snakemake1.out ./snakemake_pipeline.png ./LICENSE ./.github ./.github/workflows ./.github/workflows/python-package.yml ./.github/workflows/python-publish.yml ./setup.py ./.git ./.git/config ./.git/refs ./.git/refs/remotes ./.git/refs/remotes/origin ./.git/refs/remotes/origin/HEAD ./.git/refs/tags ./.git/refs/heads ./.git/refs/heads/master ./.git/info ./.git/info/exclude ./.git/index ./.git/description ./.git/packed-refs ./.git/HEAD ./.git/branches ./.git/objects ./.git/objects/info ./.git/objects/pack ./.git/objects/pack/pack-b0d9fddd5a1070983ffe6b557a0c1671fcccbb34.pack ./.git/objects/pack/pack-b0d9fddd5a1070983ffe6b557a0c1671fcccbb34.idx ./.git/hooks ./.git/hooks/update.sample ./.git/hooks/fsmonitor-watchman.sample ./.git/hooks/applypatch-msg.sample ./.git/hooks/pre-rebase.sample ./.git/hooks/pre-applypatch.sample ./.git/hooks/pre-receive.sample ./.git/hooks/pre-push.sample ./.git/hooks/pre-commit.sample ./.git/hooks/prepare-commit-msg.sample ./.git/hooks/commit-msg.sample ./.git/hooks/post-update.sample ./.git/logs ./.git/logs/refs ./.git/logs/refs/remotes ./.git/logs/refs/remotes/origin ./.git/logs/refs/remotes/origin/HEAD ./.git/logs/refs/heads ./.git/logs/refs/heads/master ./.git/logs/HEAD ./.snakemake ./.snakemake/locks ./.snakemake/locks/0.input.lock ./.snakemake/locks/0.output.lock ./.snakemake/auxiliary ./.snakemake/singularity ./.snakemake/conda-archive ./.snakemake/conda ./.snakemake/metadata ./.snakemake/log ./.snakemake/log/2023-06-18T084742.303094.snakemake.log ./.snakemake/log/2023-06-16T135236.534425.snakemake.log ./.snakemake/log/2023-06-18T084703.199353.snakemake.log ./.snakemake/shadow ./.snakemake/incomplete ./.snakemake/incomplete/dGVzdF9vdXQvc2NyaXB0cy9jcmVhdGVUU1Z0YWJsZXMuaXB5bmI= ./script_config.yaml ./vcfLoc ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr8.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr10.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr6.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chrY.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr12.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr21.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr17.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr5.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr20.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr3.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr13.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr11.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr19.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr18.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr16.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chrX.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr15.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr22.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr2.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr7.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr4.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr14.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz ./vcfLoc/gnomad.genomes.v3.1.2.sites.chr9.vcf.bgz

KalinNonchev commented 1 year ago

So the reason is that you are getting an error in in "insertVariants.ipynb". What is the output from

snakemake -n

and

snakemake --cores 12
pkiehl2002 commented 1 year ago

It's the same output for snakemake -n and snakemake --cores 12: Building DAG of jobs... IncompleteFilesException: The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with

snakemake --cleanup-metadata <filenames>

To re-generate the files rerun your command with the --rerun-incomplete flag. Incomplete files: test_out/scripts/createTSVtables.ipynb

KalinNonchev commented 1 year ago

It looks like "createTSVtables.ipynb" failed as well.

snakemake  --rerun-incomplete --cores 12 
pkiehl2002 commented 1 year ago

Here's the output: Building DAG of jobs... LockException: Error: Directory cannot be locked. Please make sure that no other Snakemake process is trying to create the same files in the following directory: /ocean/projects/bio140004p/vis59/Paris_AIBIDS/from_htc_to_psc/gnomAD_DB If you are sure that no other instances of snakemake are running on this directory, the remaining lock was likely caused by a kill signal or a power loss. It can be removed with the --unlock argument.

KalinNonchev commented 1 year ago

Please follow the instructions of snakemake. In your case it would be

snakemake  --rerun-incomplete --cores 12 --unlock

and

snakemake  --rerun-incomplete --cores 12 
KalinNonchev commented 1 year ago

Let me know what the error is afterwards

pkiehl2002 commented 1 year ago

So far, no errors. It's made insertVariants.ipynb and gnomad_db.sqlite3. However, it's not adding more data to the .tsv.gz files. Does 15G sound like enough for all the chromosome variants?

KalinNonchev commented 1 year ago

It makes "insertVariants.ipynb" but is it still running? When you open "insertVariants.ipynb" are all cells executed?

pkiehl2002 commented 1 year ago

I tried to open insertVariants.ipynb, and no joke it is no longer there. There is only createTSVtables.ipynb. It isn't on the find list either.

KalinNonchev commented 1 year ago

I don't think the snakemake pipeline was executed. The sql database should be > 50G depending on the number of columns you define and the run time should be also few hours.

snakemake  --rerun-incomplete --cores 12

After running this, you should see a message that all rules finished successfully. Otherwise you wait. Yes, you will see the files, but they are being filled.

KalinNonchev commented 1 year ago

I am assuming everything is ok now. Open the issue if necessary.