alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 506 forks source link

Running multiple jobs simultaneously? #1820

Open dhuisman opened 1 year ago

dhuisman commented 1 year ago

Hi Alex,

I'm a first time user trying to run 2 seqwell samples through STARsolo to generate matrix files. I'm using the HCC to do so, and the time limit is 7 days for a job. Right now the way I have it set up it's running the control sample and then then knockout sample, and it's putting both samples into one output bam file. This would take 8+ days to finish, so unfortunately it will time out and not be useful. There are a couple options then, is there a way I can get it to run both samples simultaneously with control.bam and a knockout.bam output files? Alternatively, I've tried to set up two separate jobs each running one of the samples, but if I use readFilesManifest with only one sample listed I get an error saying it cannot open read1. If I use the readFilesIn command I get a segmentation error.

Do you have any advice?

Thank you, Dianna

alexdobin commented 1 year ago

Hi Dianna,

how many reads are in each sample, and how many threads are you using? Also please post your command line.

dhuisman commented 1 year ago

Hi Alex,

My command line is as follows:

!/bin/sh

SBATCH --time 120:00:00

SBATCH --partition=batch

SBATCH --mem=80Gb

SBATCH --array=1

SBATCH --job-name=STARsolo

SBATCH --error=/work/lewis/dhuisman/30-805857467/00_fastq/array1.%A.err

SBATCH --output=/work/lewis/dhuisman/30-805857467/00_fastq/array1.%A.out

module load star/2.7

mkdir Array1

cd /work/lewis/dhuisman/30-805857467/00_fastq/

STAR --soloType SmartSeq \

--soloCBstart 67 \

--soloCBlen 12 \

--soloUMIstart 79 \

--soloUMIlen 8 \

--soloUMIdedup Exact \

--soloBarcodeReadLength 1 \

--readFilesManifest /work/lewis/dhuisman/30-805857467/00_fastq/H82NTCsamplesheet \

--genomeDir /work/lewis/dhuisman/30-805857467/00_fastq/genome_index/ \

--outSAMtype BAM SortedByCoordinate \

--outFileNamePrefix /work/lewis/dhuisman/30-805857467/00_fastq/Array1/NTC

As mentioned, I have tested switching out –readFilesManifest for –readFilesIn --readFilesIn RESUB-NTC-H82-SAMPLE2_R1_001.fastq RESUB-NTC-H82-SAMPLE2_R2_001.fastq

My control sample is slightly over 1 billion reads and my knockout sample is slightly under 1 billion reads.

As far as the threads question, I will do some asking around and get back to you. Maybe that is the issue.

Thank you, Dianna Huisman, BA PhD Candidate, Lewis Lab Cancer Research Graduate Program Eppley Cancer Institute University of Nebraska Medical Center @.***

From: Alexander Dobin @.> Reply-To: alexdobin/STAR @.> Date: Tuesday, April 11, 2023 at 2:23 PM To: alexdobin/STAR @.> Cc: "Huisman, Dianna H" @.>, Author @.***> Subject: Re: [alexdobin/STAR] Running multiple jobs simultaneously? (Issue #1820)

Non-UNMC email Hi Dianna, how many reads are in each sample, and how many threads are you using? Also please post your command line. — Reply to this email directly, view it on GitHub [github. com], or unsubscribe [github. com]. You are receiving this

Hi Dianna,

how many reads are in each sample, and how many threads are you using? Also please post your command line.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/alexdobin/STAR/issues/1820*issuecomment-1503974054__;Iw!!JkUDQA!JlrxFYrFtz3cP0oo-fPYtSQ3rK7RottI6lXi2nDsvTfWKGaO2hqeASaMd1-kbs5reQIcmEtViOGDbE4Z5Vzqq9Nnpgfv$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/A7ARBGKWLI2IKHNAUJ2V5C3XAWVSTANCNFSM6AAAAAAWVYLKWY__;!!JkUDQA!JlrxFYrFtz3cP0oo-fPYtSQ3rK7RottI6lXi2nDsvTfWKGaO2hqeASaMd1-kbs5reQIcmEtViOGDbE4Z5Vzqq4rXY5N3$. You are receiving this because you authored the thread.Message ID: @.***>

The information in this e-mail may be privileged and confidential, intended only for the use of the addressee(s) above. Any unauthorized use or disclosure of this information is prohibited. If you have received this e-mail by mistake, please delete it and immediately contact the sender.

dhuisman commented 1 year ago

I found on the HCC website, if you do not specify the number of cores or threads in your submission it will default to using one core.

Dianna Huisman, BA PhD Candidate, Lewis Lab Cancer Research Graduate Program Eppley Cancer Institute University of Nebraska Medical Center @.***

From: "Huisman, Dianna H" @.> Date: Tuesday, April 11, 2023 at 3:21 PM To: alexdobin/STAR @.> Subject: Re: [alexdobin/STAR] Running multiple jobs simultaneously? (Issue #1820)

Hi Alex,

My command line is as follows:

!/bin/sh

SBATCH --time 120:00:00

SBATCH --partition=batch

SBATCH --mem=80Gb

SBATCH --array=1

SBATCH --job-name=STARsolo

SBATCH --error=/work/lewis/dhuisman/30-805857467/00_fastq/array1.%A.err

SBATCH --output=/work/lewis/dhuisman/30-805857467/00_fastq/array1.%A.out

module load star/2.7

mkdir Array1

cd /work/lewis/dhuisman/30-805857467/00_fastq/

STAR --soloType SmartSeq \

--soloCBstart 67 \

--soloCBlen 12 \

--soloUMIstart 79 \

--soloUMIlen 8 \

--soloUMIdedup Exact \

--soloBarcodeReadLength 1 \

--readFilesManifest /work/lewis/dhuisman/30-805857467/00_fastq/H82NTCsamplesheet \

--genomeDir /work/lewis/dhuisman/30-805857467/00_fastq/genome_index/ \

--outSAMtype BAM SortedByCoordinate \

--outFileNamePrefix /work/lewis/dhuisman/30-805857467/00_fastq/Array1/NTC

As mentioned, I have tested switching out –readFilesManifest for –readFilesIn --readFilesIn RESUB-NTC-H82-SAMPLE2_R1_001.fastq RESUB-NTC-H82-SAMPLE2_R2_001.fastq

My control sample is slightly over 1 billion reads and my knockout sample is slightly under 1 billion reads.

As far as the threads question, I will do some asking around and get back to you. Maybe that is the issue.

Thank you, Dianna Huisman, BA PhD Candidate, Lewis Lab Cancer Research Graduate Program Eppley Cancer Institute University of Nebraska Medical Center @.***

From: Alexander Dobin @.> Reply-To: alexdobin/STAR @.> Date: Tuesday, April 11, 2023 at 2:23 PM To: alexdobin/STAR @.> Cc: "Huisman, Dianna H" @.>, Author @.***> Subject: Re: [alexdobin/STAR] Running multiple jobs simultaneously? (Issue #1820)

Non-UNMC email Hi Dianna, how many reads are in each sample, and how many threads are you using? Also please post your command line. — Reply to this email directly, view it on GitHub [github. com], or unsubscribe [github. com]. You are receiving this

Hi Dianna,

how many reads are in each sample, and how many threads are you using? Also please post your command line.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/alexdobin/STAR/issues/1820*issuecomment-1503974054__;Iw!!JkUDQA!JlrxFYrFtz3cP0oo-fPYtSQ3rK7RottI6lXi2nDsvTfWKGaO2hqeASaMd1-kbs5reQIcmEtViOGDbE4Z5Vzqq9Nnpgfv$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/A7ARBGKWLI2IKHNAUJ2V5C3XAWVSTANCNFSM6AAAAAAWVYLKWY__;!!JkUDQA!JlrxFYrFtz3cP0oo-fPYtSQ3rK7RottI6lXi2nDsvTfWKGaO2hqeASaMd1-kbs5reQIcmEtViOGDbE4Z5Vzqq4rXY5N3$. You are receiving this because you authored the thread.Message ID: @.***>

The information in this e-mail may be privileged and confidential, intended only for the use of the addressee(s) above. Any unauthorized use or disclosure of this information is prohibited. If you have received this e-mail by mistake, please delete it and immediately contact the sender.

alexdobin commented 1 year ago

Hi @dhuisman

in addition to requesting threads from the cluster, you also need to specify the same number of threads for STAR, e.g. --runThreadN 10.

dhuisman commented 1 year ago

Hi Alex,

I tried this in combination with –readFilesManifest and –readFilesIn. See scripts and errors below. Thank you very much for your help!

!/bin/sh

SBATCH --time 120:00:00

SBATCH --partition=batch

SBATCH --nodes=1

SBATCH --ntasks-per-node=10

SBATCH --mem=80Gb

SBATCH --array=1

SBATCH --job-name=STARsolo

SBATCH --error=/work/lewis/dhuisman/30-805857467/00_fastq/thread.%A.err

SBATCH --output=/work/lewis/dhuisman/30-805857467/00_fastq/thread.%A.out

module load star/2.7

mkdir THREAD

cd /work/lewis/dhuisman/30-805857467/00_fastq/

STAR --soloType SmartSeq \

--runThreadN 10 \

--soloCBstart 67 \

--soloCBlen 12 \

--soloUMIstart 79 \

--soloUMIlen 8 \

--soloUMIdedup Exact \

--soloBarcodeReadLength 1 \

--readFilesManifest /work/lewis/dhuisman/30-805857467/00_fastq/threadincreasesamplesheet \

--genomeDir /work/lewis/dhuisman/30-805857467/00_fastq/genome_index/ \

--outSAMtype BAM SortedByCoordinate \

--outFileNamePrefix /work/lewis/dhuisman/30-805857467/00_fastq/THREAD/NTC

threadincreasesamplesheet:

RESUB-NTC-H82-SAMPLE2_R1_001.fastq RESUB-NTC-H82-SAMPLE2_R2_001.fastq NTC

EXITING because of fatal input ERROR: could not open readFilesIn=Read1

Apr 12 15:08:16 ...... FATAL ERROR, exiting

!/bin/sh

SBATCH --time 120:00:00

SBATCH --partition=batch

SBATCH --nodes=1

SBATCH --ntasks-per-node=10

SBATCH --mem=80Gb

SBATCH --array=1

SBATCH --job-name=STARsolo

SBATCH --error=/work/lewis/dhuisman/30-805857467/00_fastq/thread.%A.err

SBATCH --output=/work/lewis/dhuisman/30-805857467/00_fastq/thread.%A.out

module load star/2.7

mkdir THREAD

cd /work/lewis/dhuisman/30-805857467/00_fastq/

STAR --soloType SmartSeq \

--runThreadN 10 \

--soloCBstart 67 \

--soloCBlen 12 \

--soloUMIstart 79 \

--soloUMIlen 8 \

--soloUMIdedup Exact \

--soloBarcodeReadLength 1 \

--readFilesIn RESUB-NTC-H82-SAMPLE2_R1_001.fastq RESUB-NTC-H82-SAMPLE2_R2_001.fastq \

--genomeDir /work/lewis/dhuisman/30-805857467/00_fastq/genome_index/ \

--outSAMtype BAM SortedByCoordinate \

--outFileNamePrefix /work/lewis/dhuisman/30-805857467/00_fastq/THREAD/NTC

/var/spool/slurmd/job2275711/slurm_script: line 27: 765465 Segmentation fault (core dumped)

Dianna Huisman, BA PhD Candidate, Lewis Lab Cancer Research Graduate Program Eppley Cancer Institute University of Nebraska Medical Center @.***

From: Alexander Dobin @.> Reply-To: alexdobin/STAR @.> Date: Wednesday, April 12, 2023 at 2:11 PM To: alexdobin/STAR @.> Cc: "Huisman, Dianna H" @.>, Mention @.***> Subject: Re: [alexdobin/STAR] Running multiple jobs simultaneously? (Issue #1820)

Non-UNMC email Hi @dhuisman [github. com] in addition to requesting threads from the cluster, you also need to specify the same number of threads for STAR, e. g. --runThreadN 10. — Reply to this email directly, view it on GitHub [github. com], or unsubscribe

Hi @dhuisman [github.com]https://urldefense.com/v3/__https:/github.com/dhuisman__;!!JkUDQA!KRuoiFnIv4RPEOA3Shdro7olaX_vHTbYFhsE2wJpaduJzD5ZLzO96F15Fl6d_o1v-TEAe91o0ut7UwcWitgMgz5R_Hr0$

in addition to requesting threads from the cluster, you also need to specify the same number of threads for STAR, e.g. --runThreadN 10.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https:/github.com/alexdobin/STAR/issues/1820*issuecomment-1505791281__;Iw!!JkUDQA!KRuoiFnIv4RPEOA3Shdro7olaX_vHTbYFhsE2wJpaduJzD5ZLzO96F15Fl6d_o1v-TEAe91o0ut7UwcWitgMgwIki09i$, or unsubscribe [github.com]https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/A7ARBGIH6TXBPPF3GWL2XJTXA3443ANCNFSM6AAAAAAWVYLKWY__;!!JkUDQA!KRuoiFnIv4RPEOA3Shdro7olaX_vHTbYFhsE2wJpaduJzD5ZLzO96F15Fl6d_o1v-TEAe91o0ut7UwcWitgMg4o6ViH4$. You are receiving this because you were mentioned.Message ID: @.***>

The information in this e-mail may be privileged and confidential, intended only for the use of the addressee(s) above. Any unauthorized use or disclosure of this information is prohibited. If you have received this e-mail by mistake, please delete it and immediately contact the sender.

alexdobin commented 1 year ago

Hi Dianna,

please send me the Log.out file of the failed run.

dhuisman commented 1 year ago

Hi Alex,

I've attached all the run files associated with a run using the --readFilesIn command.

Thanks again for all your help!

Dianna Huisman, BA

Graduate Student, Lewis Lab

Cancer Research Graduate Program

Eppley Cancer Institute

University of Nebraska Medical Center

@.***


From: Alexander Dobin @.> Sent: Thursday, April 13, 2023 10:15 AM To: alexdobin/STAR @.> Cc: Huisman, Dianna H @.>; Mention @.> Subject: Re: [alexdobin/STAR] Running multiple jobs simultaneously? (Issue #1820)

Non-UNMC email Hi Dianna, please send me the Log. out file of the failed run. — Reply to this email directly, view it on GitHub [github. com], or unsubscribe [github. com]. You are receiving this because you were mentioned. Message ID: alexdobin/STAR/issues/1820/1507153886@ github. com

Hi Dianna,

please send me the Log.out file of the failed run.

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https://github.com/alexdobin/STAR/issues/1820*issuecomment-1507153886__;Iw!!JkUDQA!PnFVZcqNrHCfSnMH1a05I83IsX04JM1bh7t25cZvPN9jkQSih5QJVzoaE1fRfg22e8I054j6avMSggZAEnHAZCBIb6bl$, or unsubscribe [github.com]https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A7ARBGLEBDNV5BFS4UOPYUTXBAKCVANCNFSM6AAAAAAWVYLKWY__;!!JkUDQA!PnFVZcqNrHCfSnMH1a05I83IsX04JM1bh7t25cZvPN9jkQSih5QJVzoaE1fRfg22e8I054j6avMSggZAEnHAZPN7OpfO$. You are receiving this because you were mentioned.Message ID: @.***>

The information in this e-mail may be privileged and confidential, intended only for the use of the addressee(s) above. Any unauthorized use or disclosure of this information is prohibited. If you have received this e-mail by mistake, please delete it and immediately contact the sender.

alexdobin commented 1 year ago

Hi Diann,

the files did not get attached; you would need to do it from the GitHub website, it does not work via the reply-to.

dhuisman commented 1 year ago

Hi Alex,

Sorry about that. I've attached them here. The .bam file was empty so I left that off.

NTCthreadincreaseLog.out.txt

NTCthreadincreaseLog.progress.out.txt

thread.2301832.out.txt thread.2301832.err.txt

Thanks again!

alexdobin commented 1 year ago

Hi Dianna,

does the single-cell protocol contain cell barcodes and UMIs? In such case, you need to use --soloType CB_UMI_Simple option. The SmartSeq option is for plate-based protocols that have separate fastq files for each of the cells.

dhuisman commented 1 year ago

Hi Alex,

Yes, we used the SeqWell protocol, so there are cell barcodes and UMIs. I'll try changing the --soloType option and see if that solves it.

Thanks!