alexdobin / STAR

RNA-seq aligner
MIT License
1.77k stars 495 forks source link

why do i get such a small output file for STARsolo? #2125

Open gmnmnm opened 2 months ago

gmnmnm commented 2 months ago

I ran STARsolo like the code below on google colab pro with a human kidney transplant rejection biopsy sample scrnaseq by 10X 5' chromium v3.1 The output I got was a 700mb bam file and barcodes.tsv, features.tsv, matrix.mtx only 1~5kb each. + the features file was empty except "missing features"

I was attempting to find DEGs by cell type with seurat afterwards, but this output i got isn't useful so is there any mistake i made while running starsolo or is there some way i can use the bam file for further analysis by single cell?

!wget https://github.com/alexdobin/STAR/archive/2.7.11b.tar.gz !tar -xzf 2.7.11b.tar.gz !cd STAR-2.7.11b

%%bash cd /content/STAR-2.7.11b/source make STAR

%%bash sudo apt-get update sudo apt-get install g++ sudo apt-get install make

!wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz !wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz

%%bash sudo apt install unzip

%%bash gzip -d /content/GRCh38_latest_genomic.fna.gz gzip -d /content/GRCh38_latest_genomic.gff.gz

%%bash /content/STAR-2.7.11b/source/STAR \ --runThreadN 12 \ --runMode genomeGenerate \ --genomeDir /content/STAR-2.7.11b/genome \ --genomeFastaFiles /content/GRCh38_latest_genomic.fna \ --sjdbGTFfile /content/GRCh38_latest_genomic.gff \ --sjdbOverhang 100

!wget "ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR100/ERR10040318/sc5rEXT217_hg19_S1_L001_R2_001.fastq.gz" !wget "ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR100/ERR10040318/sc5rEXT217_hg19_S1_L001_R1_001.fastq.gz"

%%bash curl -o cellranger-8.0.0.tar.gz "https://cf.10xgenomics.com/releases/cell-exp/cellranger-8.0.0.tar.gz?Expires=1713989254&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA&Signature=ITjFA8SQmAmUz1yrPO4NPWwfF-fCEI72f8uyf2UCJpxk~dsQsVZ9YXcc42aIIDY9jNo4LgAHMwuejpS7ZNOX0581sHfSHR4zEanVL1L38DtzCkOVd~F83VIZyrZm-qh7toMS4Fe9GbA4YWbmVVX9sRkHhzuWZuXKNrCyGFbhwsaPYDc9reWKu9dZ4HRbodoGSd9BTilOR13SMbwjgRHdJNJsvfHgCV2Px76bW8LP~wcpEsac51mdCOsonGGc-cdRg1dcs91bQjANIA-32eBxOArH4~-l33Cbx7RqG-nCvsgbFSgYOATRzQJeDjRi5-doHAZLQ-0B-4e9AeMOMYxD8A__"

%%bash tar -zxvf /content/cellranger-8.0.0.tar.gz

%%bash gzip -d /content/cellranger-8.0.0/lib/python/cellranger/barcodes/3M-5pgex-jan-2023.txt.gz

%%bash /content/STAR-2.7.11b/source/STAR \ --runThreadN 4 \ --genomeDir /content/STAR-2.7.11b/genome \ --readFilesIn /content/sc5rEXT217_hg19_S1_L001_R2_001.fastq.gz /content/sc5rEXT217_hg19_S1_L001_R1_001.fastq.gz \ --readFilesCommand zcat \ --soloType CB_UMI_Simple \ --soloCBwhitelist /content/cellranger-8.0.0/lib/python/cellranger/barcodes/3M-5pgex-jan-2023.txt \ --soloUMIlen 12 \ --outSAMtype BAM SortedByCoordinate

xiaobearxiaobear commented 1 week ago

i have a similary question, now have you solved it