Closed lovebaboon1989 closed 11 months ago
To build genome indexes with custom parameters, such as the --limitGenomeGenerateRAM
of STAR, you have to make the genome index(es) with a separate CirComPara2 run (and command).
Follow the instructions here and add the --limitGenomeGenerateRAM parameter to STAR_EXTRA_PARAMS.
For other options of the genome index generator script, check the help with [path_to_circompara2_home]/src/utils/bash/make_indexes "-h"
.
With the CirComPara2 Docker container, you need to change the default entry point
docker run -u `id -u` --rm -it -v $(pwd):$(pwd) -w $(pwd) --entrypoint /circompara2/src/utils/bash/make_indexes egaffo/circompara2:v0.1.2.1 '-h'
If you want just the STAR index, the command should look like
docker run -u `id -u` --rm -it -v $(pwd):$(pwd) -w $(pwd) --entrypoint /circompara2/src/utils/bash/make_indexes egaffo/circompara2:v0.1.2.1 'INDEXES="STAR" STAR_EXTRA_PARAMS="--limitGenomeGenerateRAM 160424593450"'
Then, set the precompiled index path as the STAR_INDEX parameter to run CirComPara2.
Hi Egaffo, Thanks for the quick reply, I am using singularity to run the circompara2 image, I would try using your suggestion to see if it works. But I also wonder if I may just skip all STAR-related procedures and methods, by passing some parameter settings to the var.py? That would be much easier for me and saves my computer RAM allocation, thanks! One relating question is I just wonder if all the commented lines in the var.py works or not, because I tried uncomment the lines of CIRCRNA_METHODS and delete circexplorer2_star, but still got the same error, indicating the exclusion is not working, or STAR is also called somewhere else. As follows is the var.py I have when running test dataset of circompara2 pipeline:
META = 'meta.csv' GENOME_FASTA = '../annotation/ref-transcripts.fa' ANNOTATION = '../annotation/ref-transcripts.gtf' CPUS = '4'
#
#
#
commented lines in the vars.py are skipped. STAR is used by DCC, CIRCexplorer_star and circRNA_finder. You also have to remove all those three methods not to run STAR. You can set
CIRCRNA_METHODS = 'ciri,findcirc,circexplorer2_segemehl,circexplorer2_bwa,circexplorer2_tophat'
However, keep in mind that Segemehl also eats a lot of RAM and requires about 60GB RAM to load the whole human genome index (STAR needs about 32GB).
For a machine with <32GB RAM, you could set
CIRCRNA_METHODS = 'ciri,findcirc,circexplorer2_bwa,circexplorer2_tophat'
I have no experience with Singularity, but it should work similarly to Docker as I know Docker containers can be converted into singularities.
Hi Egaffo, thanks for the reply. I used the STAR-index which I previously generated from previous RNAseq pipeline and used that as the pre-computed STAR-index in var.py, this works now! However, I have another error when building tophat indexes as follows: Error: Couldn't build bowtie index with err = 1 scons: *** [samples/RS-03774719_525681_RS-03668269_S1/processings/circRNAs/tophat_out/accepted_hits.bam] Error 1
Do you know how to solve this error? Thanks!
I see you are not using the genome FASTA files from Ensembl or UCSC, but perhaps a custom genome "ref-transcripts.fa" and annotation, which can cause the problem. Check that files and formats are consistent...also, try to google that error.
Ahh I see the difference between my transcripts.fa and Homo_sapiens.GRCh38.dna.primary_assembly.fa which I downloaded from Ensembl, because previously we only focused mRNA expression level in human. Now everything works well and reliable circRNA expression matrix is generated, thanks a lot!
Hi there, I am trying to apply circompara2 to detect circRNA in human RNAseq dataset, but now I ran into a problem as follows:
The step terminated: cd dbs/indexes/indexes/star/ref-transcripts && STAR --runMode genomeGenerate --runThreadN 1 --genomeFastaFiles /annotation/ref-transcripts.fa --genomeDir . && cd /home Feb 15 01:28:06 ..... started STAR run Feb 15 01:28:06 ... starting to generate Genome files scons: building terminated because of errors.
The error information: EXITING because of FATAL PARAMETER ERROR: limitGenomeGenerateRAM=31000000000is too small for your genome SOLUTION: please specify --limitGenomeGenerateRAM not less than 144424593450 and make that much RAM available
Feb 15 01:29:04 ...... FATAL ERROR, exiting scons: *** [dbs/indexes/indexes/star/ref-transcripts/chrLength.txt] Error 104
I guess this is because STAR will eat too much RAM when generate genome files, so I made a change in var.py to specify a larger RAM for STAR, but I still get the same error (still same information saying that limitGenomeGenerateRAM=31000000000is too small for your genome), so it seems like the STAR command I updated in var.py doesn't work: META = 'meta.csv' GENOME_FASTA = '../annotation/ref-transcripts.fa' ANNOTATION = '../annotation/ref-transcripts.gtf' CPUS = '1' STAR_PARAMS = ['--limitGenomeGenerateRAM', '160424593450']
Could you please help me about this error? Thanks a lot! Best,