alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

Only one sample is Mapped #2207

Open Gabriela-PH opened 2 months ago

Gabriela-PH commented 2 months ago

Hi, I'm encountering an issue while mapping some samples using STAR. The process begins normally and successfully maps the first sample, but then the subsequent samples are ignored. When I attempt to run the samples individually, I receive "KILLED" or "FATAL ERROR" messages after the first sample is mapped.

I've already tried reinstalling STAR, regenerating the gene index, and remapping, but the problem persists. Could you please provide any insights or suggestions to resolve this?

Thank you for your help!!

**SCRIPTS

Gene Index script**

!/bin/bash

SBATCH -J rna_sequencing

SBATCH --nodes=4

SBATCH --cpus-per-task=4

SBATCH --time=48:00:00

SBATCH --account=...

SBATCH --partition=normal_q

SBATCH --mail-user=...

SBATCH --mail-type=ALL

SBATCH --output=output.log

SBATCH --error=error.log

SBATCH --genomeChrBinNbits=14

Set paths

genomeDir="/p/genomeDir" genomeFastaFiles="/p/refGenome/ncbi_dataset/data/GCF_002263795.3/GCF_002263795.3_ARS-UCD2.0_genomic.fna" rawDataDir="/p/rawdata/trimmed" bamFilesDir="/p/bamFiles" gffFile="/p/refGenome/ncbi_dataset/data/GCF_002263795.3/genomic.gff" results="/p/results"

STAR Indexing

STAR --runThreadN 1 --runMode genomeGenerate --genomeDir $genomeDir --genomeFastaFiles $genomeFastaFiles --sjdbGTFfile $gffFile


Mapping script

!/bin/bash

SBATCH -J rna_sequencing

SBATCH --nodes=4

SBATCH --cpus-per-task=4

SBATCH --time=48:00:00

SBATCH --account=...

SBATCH --partition=normal_q

SBATCH --mail-user=...

SBATCH --mail-type=ALL

SBATCH --output=output.log

SBATCH --error=error.log

Set paths

genomeDir="/.../genomeDir" genomeFastaFiles="/p.../ncbi_dataset/data/GCF_002263795.3/GCF_002263795.3_ARS-UCD2.0_genomic.fna" rawDataDir="/p/rawdata/trimmed" bamFilesDir="/p/bamFiles" gffFile="/p/refGenome/ncbi_dataset/data/GCF_002263795.3/genomic.gff" results="/presults"

Gene Mapping

for sample in H_1 H_2 H_3 H_4 H_5 H_6 H_7 P_1 P_2 P_3 P_4 P_5 P_6 P_7 P_8; do STAR --genomeDir "$genomeDir/" \ --readFilesIn "$rawDataDir/${sample}_1_paired.fq" "$rawDataDir/${sample}_2paired.fq" \ --outFileNamePrefix "$bamFilesDir/${sample}" \ --outSAMtype BAM SortedByCoordinate done

alexfriman commented 2 months ago

Maybe the problem in the second sample. Did you try to rearrange the samples order?

Gabriela-PH commented 2 months ago

I've tried shifting the samples and even running them one by one, but I still get this error:

[p@maping]$ ./maping14.sh STAR --genomeDir /projects/rawdata/trimmed/HS_14_1_paired.fq /projects/rawdata/trimmed/HS_14_2_paired.fq --outFileNamePrefix /projects/bamFiles/HS14 --outSAMtype BAM SortedByCoordinate STAR version: 2.7.9a_2021-06-25 compiled: 2021-06-25T15:53:52-04:00 :/home/dobin/data/STAR/STARcode/STAR.master/source Sep 05 18:09:00 ..... started STAR run Sep 05 18:09:00 ..... loading genome Sep 05 18:10:25 ..... started mapping ./maping14.sh: line 24: 205824 Killed STAR --genomeDir "$genomeDir/" --readFilesIn "$rawDataDir/${sample}_1_paired.fq" "$rawDataDir/${sample}_2paired.fq" --outFileNamePrefix "$bamFilesDir/${sample}" --outSAMtype BAM SortedByCoordinate

alexfriman commented 2 months ago

Not sure at which moment the process is killed. How long did it run after the "Sep 05 18:10:25 ..... started mapping"?

From the looks of it, the HPC might have a problem with the task. How much memory does the process use and what is the memory limit?

I would try to modify maping14.sh by changing "--outSAMtype BAM SortedByCoordinate" to "--outSAMtype BAM Unsorted" and sort it later with samtools