MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

Locating bowtie indexes #62

Closed c-guzman closed 6 years ago

c-guzman commented 7 years ago

I can't find any documentation regarding where ShortStack looks for bowtie indexes. Could you please add this to the documentation and also let me know? Perhaps adding a parameter to allow the input of the directory would be great!

Thanks!

MikeAxtell commented 7 years ago

Hi Carlos, thanks for the question. ShortStack looks for bowtie indices at the same location as the genome fasta file specified with --genomefile. For instance, given a genomefile at /foo/bar/genome.fa , ShortStack will expect the indices to be /foo/bar/genome.1.ebwt, /foo/bar/genome.2.ebwt, /foo/bar/genome.3.ebwt, /foo/bar/genome.4.ebwt, /foo/bar/genome.rev.1.ebwt, and /foo/bar/genome.rev.2.ebwt. If one or more of the six files are not found, ShortStack will attempt to create them by calling bowtie-build on the --genomefile.

Yes, I will add this to the README at the next release, thanks for the suggestion.

I don't think I'm interested in adding another option for this though. It's easy enough to keep the genome .fasta file in the same place as the bowtie indices, and it's the way that bowtie-build makes indices as well; it places them in the same directory as the reference genome fasta file.

Regards, Mike

c-guzman commented 7 years ago

Ahh this makes sense. In my case, I am using ShortStack alignment as part of a Nextflow script. However, given that Nextflow creates symlinks, ShortStack will never identify the .ebwt indexes. The added parameter will allow me to manually include the genome index directory so that symlinks can be made.

Thanks though!

MikeAxtell commented 7 years ago

Interesting problem and I've never used Nextflow. When a ShortStack alignment to genome /foo/bar/genome.fa is called under Nextflow, are you saying that /foo/bar/genome.1.ebwt and the other index files are made as symlinks pointing somewhere else? Lines 1261-1287 show how ShortStack looks for the indices currently ... by predicting the exact file paths, and then testing with perl's -r filetest operator. Maybe -r returns false for symlinks in your case? If so, perhaps there's a fix for your use case without the need to add a new option. I worry about option creep.

I might also be misunderstanding your use case; if so, sorry.

c-guzman commented 7 years ago

Thanks for taking the time to help out, I'm quite new to Nextflow so let me try to make this more understandable.

When Nextflow calls ShortStack, it will create a symlink to the hg38.fa file, but it does not create any symlinks to any other files in the directory.

So in this case my hg38 directory contains all 6 .ebwt files, and the hg38.fa file. However when Nextflow calls ShortStack it creates a new directory, symlinks hg38.fa and thus ShortStack does not see anything but hg38.fa in the new directory and thus decides it should build the indexes.

Hope this better explains my use case.

MikeAxtell commented 7 years ago

I get it now I think, thanks. Do you want to send me an example Nextflow script so I can reproduce the use case here?

MikeAxtell commented 6 years ago

Hi Carlos. I saw this issue was still open just now. Any resolution or did you abandon?

Cheers, Mike