Oshlack / JAFFA

JAFFA is a multi-step pipeline that takes either raw RNA-Seq reads, or pre-assembled transcripts, then searches for gene fusions
https://github.com/Oshlack/JAFFA/wiki
Other
86 stars 21 forks source link

request to point to reference directory outside of install directory #64

Closed anoronh4 closed 3 years ago

anoronh4 commented 3 years ago

It is easier and cleaner to point to a reference genome directory outside of the tool install directory, especially for those that will need to switch seamlessly between hg19/hg38/mm10. another reason to add this enhancement, is that i am unable to copy/link reference files to the jaffa install directory when i am using the jaffa docker image as a singularity container, e.g.:

$ ln -s /ref_store/GRCh37/jaffa_gencode/*tab /opt/JAFFA/
ln: failed to create symbolic link '/opt/JAFFA/hg19_genCode19.tab': Read-only file system
nadiadavidson commented 3 years ago

Hi, if you have all the reference files in another directory, you can set this with "-p refBase =" when you run the command like: bpipe run -p refBase=/ref_store/GRCh37/jaffa_gencode/ <JAFFA/L.groovy file> <fastq.gz/fasta files>

anoronh4 commented 3 years ago

that was helpful, i did notice however i got the following error:

Checking for required data files...
jaffa_gencode/hg19_genCode19.fa
jaffa_gencode/hg19_genCode19.tab
/opt/JAFFA/known_fusions.txt
CAN'T FIND jaffa_gencode/hg19.fa...
PLEASE DOWNLOAD and/or FIX PATH... STOPPING NOW

and after mv hg19_gencode.fa hg19.fa it then errored saying the same thing except CAN'T FIND jaffa_gencode/hg19_gencode.fa..., so i had to make one a symlink.

I also noticed that known_fusions.txt was looked for in the installation path, rather than refBase -- seems like Jaffa might be proceeding without it. can that file be passed as a bpipe parameter as well?

anoronh4 commented 3 years ago

oh, i see it now in JAFFA_stages.groovy. the known fusions can be passed as knownTable. still don't know why it gave me errors for either fasta though.

anoronh4 commented 3 years ago

looks like fastaBase should have the base directory of the full assembly fasta and transBase should have the base directory of the transcription fasta. can you confirm if that's correct?

nadiadavidson commented 3 years ago

That's correct. hg19.fa should be the reference genome, where as hg19_gencode.fa is the reference transcriptome. Both in fasta format. So hg19.fa will need to be downloaded separately (e.g from UCSC) and there are some instruction on JAFFA's wiki about installation that describe how you can do that. Then you can reference the directory where that is with fastaBase.