Closed yingsun-ucsd closed 5 months ago
The same kind of errors to build the RSEM index:
$ docker run --rm -v /nfs/lab/ysun/RNA-seqPipeline4GTExConsortium/references:/data -t broadinstitute/gtex_rnaseq:V10 \
> /bin/bash -c "rsem-prepare-reference \
> /nfs/lab/GTEx/references/Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta \
> /nfs/lab/ysun/RNA-seqPipeline4GTExConsortium/references/rsem_reference/rsem_reference \
> --gtf /nfs/lab/GTEx/GENCODE/gencode.v39.GRCh38.annotation.gtf \
> --num-threads 4"
rsem-extract-reference-transcripts /nfs/lab/ysun/RNA-seqPipeline4GTExConsortium/references/rsem_reference/rsem_reference 0 /nfs/lab/GTEx/GENCODE/gencode.v39.GRCh38.annotation.gtf None 0 /nfs/lab/GTEx/references/Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta
Cannot open /nfs/lab/GTEx/GENCODE/gencode.v39.GRCh38.annotation.gtf! It may not exist.
"rsem-extract-reference-transcripts /nfs/lab/ysun/RNA-seqPipeline4GTExConsortium/references/rsem_reference/rsem_reference 0 /nfs/lab/GTEx/GENCODE/gencode.v39.GRCh38.annotation.gtf None 0 /nfs/lab/GTEx/references/Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta" failed! Plase check if you provide correct parameters/options for the pipeline!
You need to use the right path in the docker environment. You're mapping the input to /data
, so /nfs/lab/GTEx/references/Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta
should be /data/Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta
etc.
Thank you so much for your help, @francois-a!
$ pwd
/nfs/lab/ysun/RNA-seqPipeline4GTExConsortium/references
$ ls
Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta
gencode.v39.GRCh38.annotation.gtf
Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta.fai
rsem_reference
star_index_oh75
In this case, I mapped "/nfs/lab/ysun/RNA-seqPipeline4GTExConsortium/references" to "data", and then ran the following, but still got errors. Did I have any misunderstanding here? Thanks.
$ docker run --rm -v /nfs/lab/ysun/RNA-seqPipeline4GTExConsortium/references:/data -t broadinstitute/gtex_rnaseq:V10 \
> /bin/bash -c "STAR \
> --runMode genomeGenerate \
> --genomeDir /data/star_index_oh75 \
> --genomeFastaFiles /data/Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta \
> --sjdbGTFfile /data/gencode.v39.GRCh38.annotation.gtf \
> --sjdbOverhang 75 \
> --runThreadN 4"
STAR --runMode genomeGenerate --genomeDir /data/star_index_oh75 --genomeFastaFiles /data/Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta --sjdbGTFfile /data/gencode.v39.GRCh38.annotation.gtf --sjdbOverhang 75 --runThreadN 4
STAR version: 2.7.11b compiled: 2024-01-25T16:12:02-05:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 27 16:57:07 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from ./Log.out into /data/star_index_oh75/Log.out. Will keep ./Log.out
Jun 27 16:57:07 ... starting to generate Genome files
EXITING because of INPUT ERROR: could not open genomeFastaFile: /data/Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta
Jun 27 16:57:07 ...... FATAL ERROR, exiting
It looks like I have this mapping folder issue. For example,
$ pwd
/nfs/lab/ysun/Pankbase/GSE79469/fastq
$ ls
star_index_oh75 SRR1299319_1.fastq.gz SRR1299319_2.fastq.gz
$ docker run --rm -v /nfs/lab/ysun/Pankbase/GSE79469/fastq:/data -t broadinstitute/gtex_rnaseq:V10 \
> /bin/bash -c "/src/run_STAR.py \
> /data/star_index_oh75 \
> /data/SRR1299319_1.fastq.gz \
> /data/SRR1299319_2.fastq.gz \
> SRR1299319 \
> --threads 4 \
> --output_dir /tmp/star_out && mv /tmp/star_out /data/star_out"
STAR --runMode alignReads --runThreadN 4 --genomeDir /data/star_index_oh75 --twopassMode Basic --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.1 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --outFilterType BySJout --outFilterScoreMinOverLread 0.33 --outFilterMatchNmin 0 --outFilterMatchNminOverLread 0.33 --limitSjdbInsertNsj 1200000 --readFilesIn /data/SRR1299319_1.fastq.gz /data/SRR1299319_2.fastq.gz --readFilesCommand zcat --outFileNamePrefix /tmp/star_out/SRR1299319. --outSAMstrandField intronMotif --outFilterIntronMotifs None --alignSoftClipAtReferenceEnds Yes --quantMode TranscriptomeSAM GeneCounts --outSAMtype BAM Unsorted --outSAMunmapped Within --genomeLoad NoSharedMemory --quantTranscriptomeSAMoutput BanSingleEnd_BanIndels_ExtendSoftclip --winAnchorMultimapNmax 50 --chimSegmentMin 15 --chimJunctionOverhangMin 15 --chimOutType Junctions WithinBAM SoftClip --chimMainSegmentMultNmax 1 --chimOutJunctionFormat 0 --outSAMattributes NH HI AS nM NM ch --outSAMattrRGline ID:rg1 SM:sm1
STAR version: 2.7.11b compiled: 2024-01-25T16:12:02-05:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 27 18:41:34 ..... started STAR run
Jun 27 18:41:34 ..... loading genome
EXITING because of FATAL ERROR: could not open genome file /data/star_index_oh75//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions
Jun 27 18:41:34 ...... FATAL ERROR, exiting
Traceback (most recent call last):
File "/src/run_STAR.py", line 124, in <module>
subprocess.check_call(cmd, shell=True, executable='/bin/bash')
File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'STAR --runMode alignReads --runThreadN 4 --genomeDir /data/star_index_oh75 --twopassMode Basic --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.1 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --outFilterType BySJout --outFilterScoreMinOverLread 0.33 --outFilterMatchNmin 0 --outFilterMatchNminOverLread 0.33 --limitSjdbInsertNsj 1200000 --readFilesIn /data/SRR1299319_1.fastq.gz /data/SRR1299319_2.fastq.gz --readFilesCommand zcat --outFileNamePrefix /tmp/star_out/SRR1299319. --outSAMstrandField intronMotif --outFilterIntronMotifs None --alignSoftClipAtReferenceEnds Yes --quantMode TranscriptomeSAM GeneCounts --outSAMtype BAM Unsorted --outSAMunmapped Within --genomeLoad NoSharedMemory --quantTranscriptomeSAMoutput BanSingleEnd_BanIndels_ExtendSoftclip --winAnchorMultimapNmax 50 --chimSegmentMin 15 --chimJunctionOverhangMin 15 --chimOutType Junctions WithinBAM SoftClip --chimMainSegmentMultNmax 1 --chimOutJunctionFormat 0 --outSAMattributes NH HI AS nM NM ch --outSAMattrRGline ID:rg1 SM:sm1' returned non-zero exit status 105.
I am new to docker and really need some help to understand what's going on here. Thanks!
docker run -it --rm --user XXXX:XXXX -v /nfs/lab/ysun/RNA-seqPipeline4GTExConsortium:/data --workdir /data -t broadinstitute/gtex_rnaseq:V10 \
Fixed it.
I am building the indexes by following this, but got an error.
However, the error did not make sense because:
If I ran this command from the docker run
directly on the server, it worked.
I am very new to docker and don't understand why the "docker run" did not work. Any help will be highly appreciated.