bioinfo-pf-curie / nf-VIF

A Nextflow-based Virus Insertion Finder
Other
5 stars 5 forks source link

Question on bowtie2_split indexes #1

Open ipstone opened 3 years ago

ipstone commented 3 years ago

Hello,

Thank you for the nice work and documentation on this nf-VIF pipeline. As I am trying it out, running the test as the following, there are 3 indexes missing.

Thanks Isaac

nextflow run main.nf -profile test                     
N E X T F L O W  ~  version 20.01.0                                                                                                    
Launching `main.nf` [nice_mclean] - revision: ca730a9265                                                                               
No such file: /data/annotations/pipelines/Human/hg19/indexes/bowtie2                                                                   
No such file: /data/annotations/pipelines/HPV/complete/indexes/bowtie2_split                                                           
No such file: /data/annotations/pipelines/HPV/complete/indexes/bowtie2  
nservant commented 3 years ago

Hi, Indeed, you need 3 indexes;

However, none of these indexes are required as the pipeline is expected to build them for you if not available. Note that you need the fasta file for that.

I think that here, it does not try to build them because they are defined in the conf/genomes.config Could you try to simply remove the path to the indexes in this file ?

   'HPV' {
      fasta         = "${baseDir}/assets/HPV_REF_PaVE_65.fa"
      bowtie2       = ""
      bowtie2Split  = ""
      genes         = "${baseDir}/assets/HPV_genes.tsv"
      ctrlCapture   = "${baseDir}/assets/ctrl_capture.fasta"
    }

Thanks

ipstone commented 3 years ago

Thanks, I removed the path in the indexes as the following:

    'HPV' {
      fasta         = "${baseDir}/assets/HPVs.fa"
      /*bowtie2       = "${params.igenomes_base}/HPV/complete/indexes/bowtie2/HPVs"*/
      /*bowtie2_split = "${params.igenomes_base}/HPV/complete/indexes/bowtie2_split/"*/
      blatdb        = "${params.igenomes_base}/Human/hg19/genome/hg19.2bit"
      genes         = "${baseDir}/assets/HPV_genes.tsv"
      ctrl_capture  = "${baseDir}/assets/ctrl_capture.fasta"
    }

Then when running nextflow run main.nf -profile test, it gives the following error:

nextflow run main.nf -profile test
N E X T F L O W  ~  version 19.10.0
Launching `main.nf` [nice_cray] - revision: ca730a9265
No HPV genome specified!

Does it seem that indexes are still needed? (I also tried to specify bowtie2="", bowtie2_split="" as in your example), it gives the same results.

Another point to confirm:

Indeed, you need 3 indexes;

The genome bowtie2 index
A bowtie2 index with all HPV strains
A bowtie2 index per HPV strain

'A bowtie2 index per HPV strain' - I am wondering how this should be prepared? As we are testing multiple strains, how to provide a bowtie2 index per strain, will that result in multiple index files?

Thanks Isaac

nservant commented 3 years ago

That's weird. Usually, if you remove the indexes, it should use the fasta file to build them !! The message No HPV genome specified is launch because no indexes and no fasta file are detected.

Btw, wich Nextflow version are you using please ?