Roleren / ORFik

MIT License
32 stars 9 forks source link

getGenomeAndAnnotation error #120

Closed tamerbio closed 2 years ago

tamerbio commented 2 years ago

Hi, I am trying to build genome for Mus musculs with getGenomeAndAnnotation function:

annotation <- getGenomeAndAnnotation(organism = organism, genome = T, GTF = T, phix = F, 
                                     ncRNA = TRUE, tRNA = TRUE, rRNA = TRUE, output.dir = conf["ref"],
                                     assembly_type = "primary_assembly")

but shows an error message with no phiX174 genome found:

Downloading phix genome
Starting genome retrieval of 'Escherichia virus phiX174' from refseq ...

Unfortunately no genome file could be found for organism 'Escherichia virus phiX174'. Thus, the download of this organism has been omitted. Have you tried to specify 'reference = FALSE' ?
Error in decompressFile.default(filename = filename, ..., ext = ext, FUN = FUN) : 
  No such file: Not available

any idea what went wrong many many thanks

Roleren commented 2 years ago

Phix is usually not needed, I included it to remove the few Phix reads that get through the Illumina auto filtering.

So set the phiX argument to FALSE, and rerun, let me know if that works :)

tamerbio commented 2 years ago

Hi Roleren, I did make the phix argument false. it worked until the fastp step just before the alignment:

STAR.align.folder(conf["fastq Ribo-seq"], conf["bam Ribo-seq"], index,
                  paired.end = FALSE,
                  steps = "co-ge", # (trim needed: adapters found, then genome)
                  adapter.sequence = "auto", # Adapters are auto detected
                  trim.front = 0, min.length = 20, multiQC = T)

but unfortunately, i am getting the following error message:

downloading fastp, this will be done only once!
trying URL 'https://github.com/OpenGene/fastp/archive/master.zip'
Content type 'application/zip' length 187080 bytes (182 KB)
==================================================
downloaded 182 KB

On mac OS, must build fastp, since no precompiled binaries exists
This will only be done once
c++ -c src/adaptertrimmer.cpp -o obj/adaptertrimmer.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/basecorrector.cpp -o obj/basecorrector.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/duplicate.cpp -o obj/duplicate.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/evaluator.cpp -o obj/evaluator.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/fastareader.cpp -o obj/fastareader.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/fastqreader.cpp -o obj/fastqreader.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/filter.cpp -o obj/filter.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/filterresult.cpp -o obj/filterresult.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/htmlreporter.cpp -o obj/htmlreporter.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/jsonreporter.cpp -o obj/jsonreporter.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/main.cpp -o obj/main.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/nucleotidetree.cpp -o obj/nucleotidetree.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/options.cpp -o obj/options.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/overlapanalysis.cpp -o obj/overlapanalysis.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/peprocessor.cpp -o obj/peprocessor.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/polyx.cpp -o obj/polyx.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/processor.cpp -o obj/processor.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/read.cpp -o obj/read.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/readpool.cpp -o obj/readpool.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/seprocessor.cpp -o obj/seprocessor.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/sequence.cpp -o obj/sequence.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/stats.cpp -o obj/stats.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/threadconfig.cpp -o obj/threadconfig.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/umiprocessor.cpp -o obj/umiprocessor.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/unittest.cpp -o obj/unittest.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/writer.cpp -o obj/writer.o -std=c++11 -pthread -g -O3 -I./inc  
c++ -c src/writerthread.cpp -o obj/writerthread.o -std=c++11 -pthread -g -O3 -I./inc  
c++ ./obj/adaptertrimmer.o ./obj/basecorrector.o ./obj/duplicate.o ./obj/evaluator.o ./obj/fastareader.o ./obj/fastqreader.o ./obj/filter.o ./obj/filterresult.o ./obj/htmlreporter.o ./obj/jsonreporter.o ./obj/main.o ./obj/nucleotidetree.o ./obj/options.o ./obj/overlapanalysis.o ./obj/peprocessor.o ./obj/polyx.o ./obj/processor.o ./obj/read.o ./obj/readpool.o ./obj/seprocessor.o ./obj/sequence.o ./obj/stats.o ./obj/threadconfig.o ./obj/umiprocessor.o ./obj/unittest.o ./obj/writer.o ./obj/writerthread.o -o fastp  -lisal -ldeflate -lpthread 
ld: library not found for -lisal
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [fastp] Error 1
chmod: /Users/tali/bin/fastp-master/fastp: No such file or directory

any fix to this issue for macOS? thanks much best Tamer

Roleren commented 2 years ago

Hm, will find a fix for it tomorrow, will let you know

tamerbio commented 2 years ago

Hi Roleren, I solved the problem. it was fastp installation problem. best

tamerbio commented 2 years ago

now the next error :( i ran the script again with all required tools installed but getting this error in the alignment step: > STAR.align.folder(conf["fastq Ribo-seq"], conf["bam Ribo-seq"], index, multiQC = F)

Using STAR at location: /Users/tali/bin/STAR-2.7.4a/bin/MacOSX_x86_64/STAR
Using fastp at location: ~/bin/fastp-master/fastp
[1] "Starting time: 2022-03-09 10:55:02"
[1] "Full system call:"
[1] "/Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/RNA_Align_pipeline_folder.sh -f ~/Bio_data/raw_data/Ribo-seq/Mouse_Heart/ -o ~/Bio_data/processed_data/Ribo-seq/Mouse_Heart/ -p no -l 20 -T 3 -g /Volumes/Work/work/Genomes/Bl6/STAR_index/ -s tr-ge  -a auto -t 0 -M 10  -A Local -m 18 -i n -K no -S /Users/tali/bin/STAR-2.7.4a/bin/MacOSX_x86_64/STAR -P ~/bin/fastp-master/fastp -I /Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/RNA_Align_pipeline.sh -C /Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/cleanup_folders.sh"
##############################################

Arguments for folder run are the following:
-f input folder: /Users/tali/Bio_data/raw_data/Ribo-seq/Mouse_Heart/
-o output folder: /Users/tali/Bio_data/processed_data/Ribo-seq/Mouse_Heart/
-p paired end: no
-l minimum length of reads: 20
-T max mismatches of reads: 3
-g genome dir for all STAR indices: /Volumes/Work/work/Genomes/Bl6/STAR_index/
-s steps to do: tr-ge
-a adapter sequence: auto
-t trim front number: 0
-m max multimap: 10
-A alignment type: Local
-m maxCPU: 18
-i subfolders: n
-K Keep contamination reads: no
-S STAR location: /Users/tali/bin/STAR-2.7.4a/bin/MacOSX_x86_64/STAR
-P fastp location: /Users/tali/bin/fastp-master/fastp
-I align_single location: /Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/RNA_Align_pipeline.sh
-C cleaning location: /Library/Frameworks/R.framework/Versions/4.1/Resources/library/ORFik/STAR_Aligner/cleanup_folders.sh

Total number of files are:
2
expr: syntax error
Current step:

Single end mode for file: E15.5_Heart_Ribo_rep1.fastq.gz
File  1 /        2
-o output folder: /Users/tali/Bio_data/processed_data/Ribo-seq/Mouse_Heart/
-f input file: /Users/tali/Bio_data/raw_data/Ribo-seq/Mouse_Heart//E15.5_Heart_Ribo_rep1.fastq.gz
-a adapter sequence: auto
-q quality filtering: disable
-s steps to do: tr-ge
-r resume (r or new n): -l

Error: the given STAR index dir does not exist!

i checked the directory to STAR.index

> index
[1] "/Volumes/Work/work/Genomes/Bl6/STAR_index/"

I still don't get this error message. can you help? thanks much

Roleren commented 2 years ago

Hm, can you print all files in your index folder. It might have failed during index creation. Most important are the SA files which should be quite large.

tamerbio commented 2 years ago

yes, the index is absolutely fine. I checked three times. I even created an index with STAR and compared that index with ORFik one and they are identical.

Roleren commented 2 years ago

What are the folder names inside the index dir ?

tamerbio commented 2 years ago

genomeDir

Screenshot 2022-03-11 at 13 39 33

s

Roleren commented 2 years ago

That looks correct, will check the source code for bugs in index lookup. You have the newest ORFik right ?

tamerbio commented 2 years ago

yes,

> packageVersion("ORFik")
[1] ‘1.14.7’
Roleren commented 2 years ago

Ok, I checked through, this error can only occur if it can find the index dir, but not the genomeDir inside index dir.

That means, for the system call for STAR, this path: /Volumes/Work/work/Genomes/Bl6/STAR_index/genomeDir Does not exist, any reason how that might be?

It is bash code, that comes from the RNA_Align_pipeline bash script in inst folder, using this code:

gen_dir=index_dir
usedGenome=$gen_dir/genomeDir
if [ ! -d $usedGenome ]; then
      if [ $steps == "tr" ]; then
        echo "Running trim only mode"
      else
            echo "Error: the given STAR index dir does not exist!"
          exit 1
        fi
fi
tamerbio commented 2 years ago

yes, the directory /Volumes/Work/work/Genomes/Bl6/STAR_index/genomeDir does exist:

$ ls /Volumes/Work/work/Genomes/Bl6/STAR_index/genomeDir
Genome              SAindex             chrNameLength.txt       exonInfo.tab            sjdbInfo.txt            transcriptInfo.tab
Log.out             chrLength.txt           chrStart.txt            geneInfo.tab            sjdbList.fromGTF.out.tab
SA              chrName.txt         exonGeTrInfo.tab        genomeParameters.txt        sjdbList.out.tab
tamerbio commented 2 years ago

but there is another error message before the index directory error: expr: syntax error is it also mac-specicif error ?

tamerbio commented 2 years ago

I think I found the problem. in the alignment arguments, there is an option: -r resume (r or new n): -l this argument should be either r or n not l. how can I change it? thanks much

tamerbio commented 2 years ago

we have run the analysis on ubuntu and was fine. I still just wanna know why I experienced that error message on macOS.

Roleren commented 2 years ago

Glad to hear, it might be a non Posix call in bash that leads to something going wrong, so will double check that the bash code is all Mac compatible.

Roleren commented 2 years ago

Yeah, think I figured it out. The bash call for checking valid directory structure was actually not Posix, so on Mac it fails, when it should not really. A bit strange.

Will fix this and push later.

Did the rest of your analysis go well ?

If you need any tips on making result plots or tables let me know.

I will close this issue in a week after next push, if there is nothing else?