Oshlack / Clinker

Gene Fusion Visualiser
MIT License
51 stars 12 forks source link

File name globbing weirdness #3

Closed lachlansimpson closed 6 years ago

lachlansimpson commented 7 years ago

Seeing this error:

====================================== Stage star_genome_gen =======================================

================================= Stage star_align (23877-1100771) =================================
ERROR: stage star_align failed: Unable to locate one or more specified inputs from pipeline with the following extension(s):

(.*)_R1.fastq.gz 

========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

---------------------------------- star_align  ( 23877-1100771 )  ----------------------------------

Unable to locate one or more specified inputs from pipeline with the following extension(s):

(.*)_R1.fastq.gz

This issue is because our file naming structure doesn't fully follow this pattern - in particular, ours have the pattern:

(.*)R1.fastq.gz

like:

(.*)_R1_001.fastq.gz

How might we change the file call to just

(.*).fastq.gz

breons commented 7 years ago

Hi again,

So line 190 of clinker.pipe has the following: transform("(.*)_R1.fastq.gz","(.*)_R2.fastq.gz") to ("Aligned.sortedByCoord.out.bam") {

Basically it attempts to separate the two common named FASTQ files by their _R1 and _R2 identifier and then use those specifically in the alignment. If your files follow something like:

X_R1_001.fastq.gz and X_R2_001.fastq.gz or X_R1.fastq.gz and X_R2.fastq.gz

Then I would think you could just change the above to something like: transform("(.*)R1(.*).fastq.gz","(.*)R2(.*).fastq.gz") to ("Aligned.sortedByCoord.out.bam") {

Let me know how you go!

breons commented 7 years ago

I should also mention that you will need to do this for each stage:

index_bams - line 222 prepare_plot - line 246 plot_fusion - line 271

If you just do a global replace of: "(.)_R1.fastq.gz","(.)_R2.fastq.gz"

To whatever you like, you should get 4 changes.