bergmanlab / mcclintock

Meta-pipeline to identify transposable element insertions using next generation sequencing data
93 stars 30 forks source link

Error in rule repeatmask #88

Closed danicats closed 2 years ago

danicats commented 2 years ago

Hi, thank you for maintaining this tool! I am attempting to run the test data and I am getting the following error:

Error in rule repeatmask: jobid: 31 output: /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/SRR800842_1/intermediate/sacCer2.repeatmasker.out conda-env: /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/install/envs/conda/868b58eb

RuleException: CalledProcessError in line 306 of /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/snakemake/8262709/Snakefile: Command 'source /home/danicats/miniconda3/envs/mcclintock/bin/activate '/oak/stanford/scg/lab_asbhatt/danicats/mcclintock/install/envs/conda/868b58eb'; set -euo pipefail; python /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/snakemake/8262709/.snakemake/scripts/tmpi8viig8f.repeatmask.py' returned non-zero exit status 1. File "/home/danicats/miniconda3/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper File "/oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/snakemake/8262709/Snakefile", line 306, in rule_repeatmask File "/home/danicats/miniconda3/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/init.py", line 568, in _callback File "/home/danicats/miniconda3/envs/mcclintock/lib/python3.7/concurrent/futures/thread.py", line 57, in run File "/home/danicats/miniconda3/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/init.py", line 554, in cached_or_run File "/home/danicats/miniconda3/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/init__.py", line 2352, in run_wrapper

and here is the log file

samtools faidx /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/genome_fasta/sacCer2.fasta [bwa_index] Pack FASTA... 0.14 sec [bwa_index] Construct BWT for the packed sequence... faToTwoBit /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/genome_fasta/sacCer2.fasta /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/SRR800842_1/intermediate/genome_fasta/sacCer2.aug.fasta.2bit [bwa_index] 5.92 seconds elapse. [bwa_index] Update BWT... 0.10 sec [bwa_index] Pack forward-only FASTA... 0.13 sec [bwa_index] Construct SA from BWT and Occ... 1.76 sec [main] Version: 0.7.4-r385 [main] CMD: bwa index /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/genome_fasta/sacCer2.fasta [main] Real time: 8.170 sec; CPU: 8.059 sec bwa index /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/genome_fasta/sacCer2.fasta RepeatMasker version open-4.0.7 Search Engine: NCBI/RMBLAST [ 2.10.0+ ] Warning...unknown stuff <

Master RepeatMasker Database: /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/install/envs/conda/868b58eb/share/RepeatMasker/Libraries/RepeatMaskerLib.embl ( Complete Database: dc20170127 ) Custom Repeat Library: /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/consensus_fasta/consensusTEs.fasta

Building general libraries in: /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/install/envs/conda/868b58eb/share/RepeatMasker/Libraries/dc20170127/general RepeatMasker::createLib(): Error invoking /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/install/envs/conda/868b58eb/bin/makeblastdb on file /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/install/envs/conda/868b58eb/share/RepeatMasker/Libraries/dc20170127/general/at.lib. RepeatMasker -pa 1 -lib /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/consensus_fasta/consensusTEs.fasta -dir /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/SRR800842_1//tmp/repeatmasker -s -nolow -no_is /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/genome_fasta/sacCer2.fasta RepeatMasker -pa 1 -lib /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/consensus_fasta/consensusTEs.fasta -dir /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/SRR800842_1//tmp/repeatmasker -s -nolow -no_is /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/genome_fasta/sacCer2.fasta bedtools maskfasta -fi /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/genome_fasta/sacCer2_unaugmented.fasta -fo /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/SRR800842_1//tmp/8262709tmpmaskedreference.fasta -bed /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_data_2/sacCer2/reference_te_locations/unaugmented_inrefTEs.gff

it looks like it is having trouble "invoking makebastdb" but when I confirm the location of the makebastdb it is clearly there

Any help would be appreciated! Thanks in advance!

shunhuahan commented 2 years ago

Thanks for reporting @danicats. My first impression is that it is an environment-related issue. Could you clarify the following information regarding your failed run so that we could provide some more insightful comments?

danicats commented 2 years ago

Hi Shunhua,

Thanks so much for your help!

I just downloaded Mcclintock so it should be the latest version. I already had conda installed and I used it to install mcclintock. I didn't see any errors when I was installing but I could try reinstalling.

These are the commands that I ran:

to download test data: python3 test/download_test_data.py

to run mcclintock.py: python3 mcclintock.py -r test/sacCer2.fasta -c test/sac_cer_TE_seqs.fasta -g test/reference_TE_locations.gff -t test/sac_cer_te_families.tsv -1 test/SRR800842_1.fastq.gz -2 test/SRR800842_2.fastq.gz -p 4 -o test_data

I will try recreating the conda environment and see if that helps.

Thanks again,

Danica

shunhuahan commented 2 years ago
danicats commented 2 years ago

Hi- thanks for the help! I dug deeper into the log files, and I found that there was this error in one of the log files:

Building a new DB, current time: 11/15/2021 12:55:24

New DB name: /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_again/160309_MONK_0468_AC83LBACXX_L6_AAGAGGCA-AAGGAGTA_1/tmp/repeatmasker/RM_48825.MonNov151255052021/consensusTEs.fasta

New DB title: /oak/stanford/scg/lab_asbhatt/danicats/mcclintock/test_again/sacCer2/consensus_fasta/consensusTEs.fasta

Sequence type: Nucleotide

Keep MBits: T

Maximum file size: 1000000000B

No volumes were created.

Error: mdb_env_open: Cannot allocate memory

After some searching I found that this is a recurring problem with blast and the solution is to run:

export BLASTDB_LMDB_MAP_SIZE=100000000

In case anyone has the same problem in the future here is the thread for this solution: https://www.biostars.org/p/413294/

It seems to be running without a hitch now! Thanks!

shunhuahan commented 2 years ago

Great news on finishing the test run successfully and thanks for providing a quick solution! @danicats Looks like this issue could be computing-resource-specific (e.p. it may relate to how much virtual memory is available in the system). We will also look into this issue and see if we could solve it within the McClintock system so that users don't have to do this hack.