Closed tamerbio closed 2 years ago
Phix is usually not needed, I included it to remove the few Phix reads that get through the Illumina auto filtering.
So set the phiX argument to FALSE, and rerun, let me know if that works :)
Hi Roleren, I did make the phix argument false. it worked until the fastp step just before the alignment:
STAR.align.folder(conf["fastq Ribo-seq"], conf["bam Ribo-seq"], index,
paired.end = FALSE,
steps = "co-ge", # (trim needed: adapters found, then genome)
adapter.sequence = "auto", # Adapters are auto detected
trim.front = 0, min.length = 20, multiQC = T)
but unfortunately, i am getting the following error message:
downloading fastp, this will be done only once!
trying URL 'https://github.com/OpenGene/fastp/archive/master.zip'
Content type 'application/zip' length 187080 bytes (182 KB)
==================================================
downloaded 182 KB
On mac OS, must build fastp, since no precompiled binaries exists
This will only be done once
c++ -c src/adaptertrimmer.cpp -o obj/adaptertrimmer.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/basecorrector.cpp -o obj/basecorrector.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/duplicate.cpp -o obj/duplicate.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/evaluator.cpp -o obj/evaluator.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/fastareader.cpp -o obj/fastareader.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/fastqreader.cpp -o obj/fastqreader.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/filter.cpp -o obj/filter.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/filterresult.cpp -o obj/filterresult.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/htmlreporter.cpp -o obj/htmlreporter.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/jsonreporter.cpp -o obj/jsonreporter.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/main.cpp -o obj/main.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/nucleotidetree.cpp -o obj/nucleotidetree.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/options.cpp -o obj/options.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/overlapanalysis.cpp -o obj/overlapanalysis.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/peprocessor.cpp -o obj/peprocessor.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/polyx.cpp -o obj/polyx.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/processor.cpp -o obj/processor.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/read.cpp -o obj/read.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/readpool.cpp -o obj/readpool.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/seprocessor.cpp -o obj/seprocessor.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/sequence.cpp -o obj/sequence.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/stats.cpp -o obj/stats.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/threadconfig.cpp -o obj/threadconfig.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/umiprocessor.cpp -o obj/umiprocessor.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/unittest.cpp -o obj/unittest.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/writer.cpp -o obj/writer.o -std=c++11 -pthread -g -O3 -I./inc
c++ -c src/writerthread.cpp -o obj/writerthread.o -std=c++11 -pthread -g -O3 -I./inc
c++ ./obj/adaptertrimmer.o ./obj/basecorrector.o ./obj/duplicate.o ./obj/evaluator.o ./obj/fastareader.o ./obj/fastqreader.o ./obj/filter.o ./obj/filterresult.o ./obj/htmlreporter.o ./obj/jsonreporter.o ./obj/main.o ./obj/nucleotidetree.o ./obj/options.o ./obj/overlapanalysis.o ./obj/peprocessor.o ./obj/polyx.o ./obj/processor.o ./obj/read.o ./obj/readpool.o ./obj/seprocessor.o ./obj/sequence.o ./obj/stats.o ./obj/threadconfig.o ./obj/umiprocessor.o ./obj/unittest.o ./obj/writer.o ./obj/writerthread.o -o fastp -lisal -ldeflate -lpthread
ld: library not found for -lisal
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [fastp] Error 1
chmod: /Users/tali/bin/fastp-master/fastp: No such file or directory
any fix to this issue for macOS? thanks much best Tamer
Hm, will find a fix for it tomorrow, will let you know
Hi Roleren, I solved the problem. it was fastp installation problem. best
now the next error :(
i ran the script again with all required tools installed but getting this error in the alignment step:
> STAR.align.folder(conf["fastq Ribo-seq"], conf["bam Ribo-seq"], index, multiQC = F)
Using STAR at location: /Users/tali/bin/STAR-2.7.4a/bin/MacOSX_x86_64/STAR
Using fastp at location: ~/bin/fastp-master/fastp
[1] "Starting time: 2022-03-09 10:55:02"
[1] "Full system call:"
[1] "/Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/RNA_Align_pipeline_folder.sh -f ~/Bio_data/raw_data/Ribo-seq/Mouse_Heart/ -o ~/Bio_data/processed_data/Ribo-seq/Mouse_Heart/ -p no -l 20 -T 3 -g /Volumes/Work/work/Genomes/Bl6/STAR_index/ -s tr-ge -a auto -t 0 -M 10 -A Local -m 18 -i n -K no -S /Users/tali/bin/STAR-2.7.4a/bin/MacOSX_x86_64/STAR -P ~/bin/fastp-master/fastp -I /Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/RNA_Align_pipeline.sh -C /Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/cleanup_folders.sh"
##############################################
Arguments for folder run are the following:
-f input folder: /Users/tali/Bio_data/raw_data/Ribo-seq/Mouse_Heart/
-o output folder: /Users/tali/Bio_data/processed_data/Ribo-seq/Mouse_Heart/
-p paired end: no
-l minimum length of reads: 20
-T max mismatches of reads: 3
-g genome dir for all STAR indices: /Volumes/Work/work/Genomes/Bl6/STAR_index/
-s steps to do: tr-ge
-a adapter sequence: auto
-t trim front number: 0
-m max multimap: 10
-A alignment type: Local
-m maxCPU: 18
-i subfolders: n
-K Keep contamination reads: no
-S STAR location: /Users/tali/bin/STAR-2.7.4a/bin/MacOSX_x86_64/STAR
-P fastp location: /Users/tali/bin/fastp-master/fastp
-I align_single location: /Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/RNA_Align_pipeline.sh
-C cleaning location: /Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/cleanup_folders.sh
Total number of files are:
2
expr: syntax error
Current step:
Single end mode for file: E15.5_Heart_Ribo_rep1.fastq.gz
File 1 / 2
-o output folder: /Users/tali/Bio_data/processed_data/Ribo-seq/Mouse_Heart/
-f input file: /Users/tali/Bio_data/raw_data/Ribo-seq/Mouse_Heart//E15.5_Heart_Ribo_rep1.fastq.gz
-a adapter sequence: auto
-q quality filtering: disable
-s steps to do: tr-ge
-r resume (r or new n): -l
Error: the given STAR index dir does not exist!
i checked the directory to STAR.index
> index
[1] "/Volumes/Work/work/Genomes/Bl6/STAR_index/"
I still don't get this error message. can you help? thanks much
Hm, can you print all files in your index folder. It might have failed during index creation. Most important are the SA files which should be quite large.
yes, the index is absolutely fine. I checked three times. I even created an index with STAR and compared that index with ORFik one and they are identical.
What are the folder names inside the index dir ?
genomeDir
s
That looks correct, will check the source code for bugs in index lookup. You have the newest ORFik right ?
yes,
> packageVersion("ORFik")
[1] ‘1.14.7’
Ok, I checked through, this error can only occur if it can find the index dir, but not the genomeDir inside index dir.
That means, for the system call for STAR, this path: /Volumes/Work/work/Genomes/Bl6/STAR_index/genomeDir Does not exist, any reason how that might be?
It is bash code, that comes from the RNA_Align_pipeline bash script in inst folder, using this code:
gen_dir=index_dir
usedGenome=$gen_dir/genomeDir
if [ ! -d $usedGenome ]; then
if [ $steps == "tr" ]; then
echo "Running trim only mode"
else
echo "Error: the given STAR index dir does not exist!"
exit 1
fi
fi
yes, the directory /Volumes/Work/work/Genomes/Bl6/STAR_index/genomeDir
does exist:
$ ls /Volumes/Work/work/Genomes/Bl6/STAR_index/genomeDir
Genome SAindex chrNameLength.txt exonInfo.tab sjdbInfo.txt transcriptInfo.tab
Log.out chrLength.txt chrStart.txt geneInfo.tab sjdbList.fromGTF.out.tab
SA chrName.txt exonGeTrInfo.tab genomeParameters.txt sjdbList.out.tab
but there is another error message before the index directory error:
expr: syntax error
is it also mac-specicif error ?
I think I found the problem. in the alignment arguments, there is an option: -r resume (r or new n): -l
this argument should be either r or n not l. how can I change it?
thanks much
we have run the analysis on ubuntu and was fine. I still just wanna know why I experienced that error message on macOS.
Glad to hear, it might be a non Posix call in bash that leads to something going wrong, so will double check that the bash code is all Mac compatible.
Yeah, think I figured it out. The bash call for checking valid directory structure was actually not Posix, so on Mac it fails, when it should not really. A bit strange.
Will fix this and push later.
Did the rest of your analysis go well ?
If you need any tips on making result plots or tables let me know.
I will close this issue in a week after next push, if there is nothing else?
Hi, I am trying to build genome for Mus musculs with getGenomeAndAnnotation function:
but shows an error message with no phiX174 genome found:
any idea what went wrong many many thanks