Closed mranjan1 closed 4 years ago
Hi Manish,
we are having some network problems at the lab for a few days. Our IT is working to resolve it. There is no place for now to download files from. I think it's best to actually generate the indexes yourselve, as I have not updated the generated genomes for a long time.
Cheers Alex
Thank you Alex. I had a 'minimum hardware requirement' issue since I am unable to access my HPCC - but I built the index on AWS for now.
Best, Manish
Is this genome/index still the preferred pre-built human STAR index?
If you were to build this from the most version on NCBI GCA_000001405.28_GRCh38.p13 Would you just use the following files:
With this command?
STAR --runThreadN 4 --runMode genomeGenerate --genomeSAindexNbases 12 --genomeDir ./ --genomeFastaFiles ${GENOME} --sjdbOverhang 99 --sjdbGTFfile ${GTF} --limitGenomeGenerateRAM 15000000000 --genomeSAsparseD 3 --limitIObufferSize 50000000 --limitSjdbInsertNsj 383200
no_alt_analysis_set
preferred over the primary assembly?The no_alt_analysis_set is the one most likely to be relevant for most aligners. It removes alternate alleles. Most aligners cannot yet use alternate alleles.
Edit: I got this error trying to reproduce the index command in [2]:
EXITING because of FATAL input ERROR: --limitIObufferSize requires 2 numbers since 2.7.9a.
SOLUTION: specify 2 numbers in --limitIObufferSize : size of input and output buffers in bytes.
Jan 16 01:59:57 ...... FATAL ERROR, exiting
I'm running this version:
STAR --version
2.7.10a
Hi Josh,
the pre-built indexes are not supported at the moment. It's best to build an index with the current STAR version and current annotations.
no_alt_analysis_set is indeed the right FASTA to use. I recommend using "PRImary" FASTA and GTF from GENCODE: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/GRCm39.primary_assembly.genome.fa.gz https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.primary_assembly.annotation.gtf.gz
Cheers Alex
Thank you for the links out. I'll find the human versions and get those running today:
Do you recommend any critical parameters to adjust besides --sjdbOverhang (read length minus 1)?
Edit: I'm using 151 bp long reads and this is the command I ended up using (current GENCODE version as of this post).
wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/latest_release/GRCh38.primary_assembly.genome.fa.gz
wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/latest_release/gencode.v39.primary_assembly.annotation.gtf.gz
gzip -d *.gz
GENOME=GRCh38.primary_assembly.genome.fa
GTF=gencode.v39.primary_assembly.annotation.gtf
STAR --runThreadN 24 --runMode genomeGenerate --genomeSAindexNbases 12 --genomeDir . --genomeFastaFiles ${GENOME} --sjdbOverhang 150 --sjdbGTFfile ${GTF}
Hi Josh,
you command loos good. There are no critical parameters, but here are some you may want to consider (from ENCODE):
--outFilterType BySJout //reduces the number of "spurious" junctions
--outFilterMultimapNmax 20 //max number of multiple alignments allowed for a read: if exceeded, the read is considered unmapped
--alignSJoverhangMin 8 //min overhang for unannotated junctions
--alignSJDBoverhangMin 1 //min overhang for annotated junctions
--outFilterMismatchNmax 999 //max number of mismatches per pair (absolute)
--outFilterMismatchNoverLmax 0.06 //max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=12 for the paired read
--alignIntronMin 20 //min intron
--alignIntronMax 1000000 //max intron
--alignMatesGapMax 1000000 //max genomic distance between pairs
Cheers Alex
I would like to download the prebuild human genome index but I am not sure how to do this and what is what in the files, could someone please explain me how to download it from this website? https://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/Human/GRCh38_Ensembl99_sparseD3_sjdbOverhang99/
IIRC most of that (or the entire) directory is the first need. The index is a directory that has the genome coordinates you need to run STAR so when you run STAR you would provide the path to that directory that you've downloaded. That directory would be the genome index you use as a reference.
Thank you! I am a bit confused with the download - should I use wget and the whole path?
Hi @BubuAalbu
presently I am not making premade indexes available. Please generate the index from the proper FASTA and GTF files.
I'm constantly getting a "Gateway time out" error when I try to access http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/
Is there anyone else having the same problem?
Is there any other online repository where I can download pre-built STAR indices from?