gmgitx / BLOG_natural_science

精于勤,荒于嬉;行成于思,毁于随。 韩愈
GNU General Public License v3.0
0 stars 0 forks source link

软件star精要 #20

Open gmgitx opened 5 years ago

gmgitx commented 5 years ago

20

gmgitx commented 5 years ago

来自官网manual 基本流程总共分两步:

  1. Generating genome indexes files (see Section 2. Generating genome indexes. In this step user supplied the reference genome sequences (FASTA files) and annotations (GTF file), from which STAR generate genome indexes that are utilized in the 2nd (mapping) step. The genome indexes are saved to disk and need only be generated once for each genome/annotation combination. A limited collection of STAR genomes is available from http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/ STARgenomes/, however, it is strongly recommended that users generate their own genome indexes with most up-to-date assemblies and annotations. 在这一步有一些基本的参数: --runThreadN NumberOfThreads#线程 --runMode genomeGenerate#什么模式 --genomeDir /path/to/genomeDir#输出文件的位置,整个干净的文件夹 --genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 ...#一个或多个基因组参考序列 --sjdbGTFfile /path/to/annotations.gtf#注释 --sjdbOverhang ReadLength-1#默认100。一旦"Ideally"起来,绝对会违背加粗字。 当然也有一些高级选项 2.2.1 Which chromosomes/scaffolds/patches to include? 。。。Annotations和genome的各种情况 这里我用了前5个。

  2. Mapping reads to the genome (see Section 3. Running mapping jobs). In this step user supplies the genome files generated in the 1st step, as well as the RNA-seq reads (sequences) in the form of FASTA or FASTQ files. STAR maps the reads to the genome, and writes several output files, such as alignments (SAM/BAM), mapping summary statistics, splice junctions, unmapped reads, signal (wiggle) tracks etc. Output files are described in Section 4. Output files. Mapping is controlled by a variety of input parameters (options) that are described in brief in Section 3. Running mapping jobs, and in more detail in Section 13. Description of all options. STAR command line has the following format: STAR --option1-name option1-value(s)--option2-name option2-value(s) ... If an option can accept multiple values, they are separated by spaces, and in a few cases - by commas 在这一步有一些基本的参数: --runThreadN NumberOfThreads --genomeDir /path/to/genomeDir#索引 --readFilesIn /path/to/read1 [/path/to/read2 ]#测序数据,接受双端,接受压缩文件,多个样本一起map 当然也有一些高级选项