alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

Analysis killing after "inserting junctions into the genome indices" #858

Open krunal2406 opened 4 years ago

krunal2406 commented 4 years ago

I am using star first time, and tryng to run STAR for denovo analysis but every time its killed after some time i thought it may be due to memory but i have 500 GB space in my system , can some one suggest how i can proceed: below is what coming after running the command:

krunal@krunal:~/Krunal/chipSeq/jatropa/jATROPHA_gENOME$ STAR --runThreadN 10 --genomeDir JAT_Genome/ --sjdbGTFfile JAT_r4.5.models.gff.gtf --sjdbOverhang 100 --readFilesIn SRR1560724_1_val_1.fq.zip SRR1560724_2_val_2.fq.zip --readFilesCommand zcat Mar 16 11:55:17 ..... started STAR run Mar 16 11:55:17 ..... loading genome Mar 16 11:56:14 ..... processing annotations GTF Mar 16 11:56:17 ..... inserting junctions into the genome indices Killed

alexdobin commented 4 years ago

Hi @krunal2406

if you used --sjdbGTFfile JAT_r4.5.models.gff.gtf at the genome generation step, you do not need to use it again at the mapping step. If this does not help, please send me the Log.out file.

Cheers Alex

krunal2406 commented 4 years ago

Hi Alex, I am surprised coz if i am running same script at office it wont work but when i try in my home it worked, it may sound nonsense but jus wanan know is any internet firewall interrupt coz i used same laptop in both places and tried 3 times, every time it worked in my home but not in office. Thanks

alexdobin commented 4 years ago

Hi @krunal2406

this is hard to explain... STAR does not use any network connectivity. Maybe you run different apps at work and at home, so it uses less RAM in one case, which allows STAR to complete the job?

Cheers Alex

NikSengupta commented 4 years ago

Hello, I tried my best to go over the available solutions, but I am just so new to coding that I am unable to deduce my issue from the log.out file. My process gets killed at the 'sorting Suffix Array chunks and saving them to disk' step after a couple of minutes. I was initially attempting this on my own PC, and now I have moved onto a supercomputer shell command line. I am copying my code and attaching my log.out file. I would really appreciate any help you could provide.

Thanks! Nik

[shouvonik@owens-login04 ~]$ STAR --runThreadN 64 --runMode genomeGenerate --genomeDir /users/PAS0809/shouvonik --genomeFastaFiles GRCh38.primary_assembly.genome.fa --sjdbGTFfile gencode.v19.annotation.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 100 --genomeSAsparseD 3 --genomeSAindexNbases 12

SSH2_Log_out.docx

alexdobin commented 4 years ago

Hi Nik,

It seems you are using h38 genome assembly: GRCh38.primary_assembly.genome.fa, but your annotation file is hg19: gencode.v19.annotation.gff3 Please try to use PRImary files from GENCODE https://www.gencodegenes.org/human/

Cheers Alex

NikSengupta commented 4 years ago

Hi Alex,

Thank you for your response. I tried as you had asked. I used PRImary files for both from Gencode (GRCh38.primary_assembly.genome.fa and gencode.v34.primary_assembly.annotation.gff3) but I ended up in the same spot. Following your replies to other people, I tried to include --genomeSAsparseD 3 --genomeSAindexNbases 12, but it didn't help. I also tried to run the code with the gtf file without luck. SSH_Jun15_Log.out.docx I am not able to identify what I could be doing wrong.

alexdobin commented 4 years ago

Hi Nik,

the run in the Log.out file does not seem to contain --genomeSAsparseD 3 --genomeSAindexNbases 12. Please send me the Log.out file for that run. How much RAM does your server have?

Also, please use the GTF file from GENCODE, not GFF3.

Cheers Alex

NikSengupta commented 4 years ago

Hi Alex, I've made a little progress. I am not particularly sure about the RAM on the server, but I learnt yesterday that the server kills jobs on the login node I was using after a small amount of time. So I brought it back to my own computer to run. It ran overnight but after about 6hr in, nothing was changing so I needed to terminate it for other work. Attached is the log file from that run. I was using the gff3 file because the BAM files that I plan to index and use with another program recognizes only the gff3 file. I don't know that I understand the differences there. Can I still use the gtf file and then run my next program (MAJIQ) with the gff3 file? I definitely am more hopeful of cracking this now. Thank you for your prompt replies and help!

Best, Nik Log.out.docx

alexdobin commented 4 years ago

Hi Nik,

how much RAM does your computer have? With --genomeSAsparseD 3 --genomeSAindexNbases 12 you will need ~16GB. Also, you are using 2.5.4b, which is very old. Please switch to the recent STAR release, preferably the latest 2.7.5a

Cheers Alex