bhattlab / bhattlab_workflows

Computational workflows for metagenomics tasks, by the Bhatt lab
http://www.bhattlab.com
46 stars 15 forks source link

Preprocessing breaks on sample names that are only numbers #21

Closed bfremin closed 4 years ago

bfremin commented 5 years ago

Specific Scenario: I had 41 samples. 40 of them were named like this: 1_1.fq.gz 1_2.fq.gz 2_1.fq.gz 2_2.fq.gz ...

One of them was named: aerobicmixture_1.fq.gz aerobicmixture_2.fq.gz

The pipeline only worked for aerobicmixture. The rest of the samples did not get submitted for trimming. The pipeline was still running but not submitting these jobs.

If I change the first 40 files to: AD1_1.fq.gz AD1_2.fq.gz AD2_1.fq.gz AD2_2.fq.gz

It seems to work fine for them all.

jribado commented 5 years ago

Strange, since the pipeline trims filenames at 1.{extension} not just the number.

tamburinif commented 5 years ago

I think it has to do with how the regex is, I’ll try to fix it later