COMBINE-lab / SalmonTools

Useful tools for working with Salmon output
BSD 3-Clause "New" or "Revised" License
36 stars 20 forks source link

generateDecoyTranscriptome.sh gets 21 killed #6

Open antonkulaga opened 5 years ago

antonkulaga commented 5 years ago

I've made a docker container for SalmonTools https://quay.io/repository/comp-bio-aging/salmon-tools However, I constantly get:

/opt/SalmonTools/scripts/generateDecoyTranscriptome.sh: line 105: 21 Killed $mashmap -r reference.masked.genome.fa -q $txpfile -t $threads --pi 80 -s 500

I run it on 32 cores machine with 64 GB RAM and I use Ensembl human genome. I think something may be wrong in the bash script itself

/opt/SalmonTools/scripts/generateDecoyTranscriptome.sh: line 105:    21 Killed                  $mashmap -r reference.masked.genome.fa -q $txpfile -t $threads --pi 80 -s 500

***************
*** ABORTED ***
***************

An error occurred. Exiting...

the command is:

/opt/SalmonTools/scripts/generateDecoyTranscriptome.sh -a /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.96.gtf -g /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.dna.primary_assembly.fa -t /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.cdna.all.fa -j 16 -o output

the stdout file is:

*** getDecoy ***
****************
-a <Annotation GTF file> = /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.96.gtf
-g <Genome fasta> = /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.dna.primary_assembly.fa
-t <Transcriptome fasta> = /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.cdna.all.fa
-j <Concurrency level> = 16
-o <Output files Path> = output
[1/10] Extracting exonic features from the gtf
[2/10] Masking the genome fasta
[3/10] Aligning transcriptome to genome
>>>>>>>>>>>>>>>>>>
Reference = [reference.masked.genome.fa]
Query = [/cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.cdna.all.fa]
Kmer size = 16
Window size = 5
Segment length = 500 (read split allowed)
Alphabet = DNA
Percentage identity threshold = 80%
Mapping output file = mashmap.out
Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
Execution threads  = 16
>>>>>>>>>>>>>>>>>>
INFO, skch::Sketch::build, minimizers picked from reference = 938129647
k3yavi commented 5 years ago

I think it's related to https://github.com/COMBINE-lab/SalmonTools/issues/5. The problem is memory usage, I think . We've raised the issue on mashmap's repo here.

antonkulaga commented 5 years ago

I have 64GB RAM, is it not enough? Also, why did you choose mashmap, it has not been updates for a year. Why not minimap2 which is fast, eats less memory and good for both short and long reads?