UCSC-Treehouse / pipelines

Makefiles to run dockerized pipelines used in Treehouse on a single sample
Apache License 2.0
3 stars 6 forks source link

Improve Treeshop's ability to recognize R1/R2 naming conventions #18

Open klearned opened 6 years ago

klearned commented 6 years ago

Improve Treeshop's ability to recognize R1/R2 naming conventions

Background: The Makefile currently contains two regex lines to recognize which primary files are R1 and which are R2:

R1 = $(shell find samples -iregex ".+1[^0-9]*$$" | head -1)
R2 = $(shell find samples -iregex ".+2[^0-9]*$$" | head -1)

However, lately, these regex don't work and we have to change them by hand in the Makefile to the following in order to recognize the naming convention of the files we've been getting lately:

R1 = $(shell find samples -iregex ".+R1[^0-9]+.+" | head -1)
R2 = $(shell find samples -iregex ".+R2[^0-9]+.+" | head -1)

Solution suggested by Ellen: The fab file should use a more sophisticated detection mechanism than a regex and then send THAT to the makefile

rcurrie commented 6 years ago

Suggest we keep the default Makefile regex and then override it from fabfile. We've been down this rabbit hole many times and there is no one size fits all so the Makefile should work well with the common case (likely R1/R2) with fab trying to disambiguate.