Closed rmonti closed 7 years ago
Hi Remo.
When parsing filepaths of read files, ShortStack takes everything from the first '.' to the end of the path as a suffix (e.g. file extension). The remaining 'basename' of the initial files are kept as read-groups (and for other purposes).
Yes, the hypothetical situation you describe would cause issues. I never thought of it because I personally am strict with using '.' in files only for file extensions / suffices to describe the file format, but not for distinguishing names. But that is style, and obviously style varies between users.
I'll mark this as a bug to include a fix for in the next release.
On Mon, Jan 16, 2017 at 1:25 PM, rmonti notifications@github.com wrote:
Hi Mike,
I was wondering how ShortStack determines the names of the read groups (RG) for the output alignment files.
I did not specify any prefixes, and submitted a file with this path and name:
../fastq/BHXPU.11045.1.190494.TTAGGC.filter-SMRNA.fastq.gz
it somehow guessed that it should call the bam-file BHXPU.bam and then merge it into the merged_alignments.bam with RG=BHXPU, which is essentially what I wanted.
does it just take the name of the file up to the first period as read-group?
I could imagine cases where this would yield an error, e.g. if I submit a bunch of files that looked like this:
./reads.1.fastq ./reads.2.fastq
and so on...
So how are the names determined?
best,
Remo
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/45, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXiWe3Bsr81WDOSy6r04wl7Wfk4ujCks5rS7X_gaJpZM4Lk2wc .
-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell
To clarify, when parsing read file names, everything after the last forward slash / but before the first period . is considered the file's 'base name'.
Fixed in release 3.7
Hi Mike,
I was wondering how ShortStack determines the names of the read groups (RG) for the output alignment files.
I did not specify any prefixes, and submitted a file with this path and name:
../fastq/BHXPU.11045.1.190494.TTAGGC.filter-SMRNA.fastq.gz
it somehow guessed that it should call the bam-file BHXPU.bam and then merge it into the merged_alignments.bam with RG=BHXPU, which is essentially what I wanted.
does it just take the name of the file up to the first period as read-group?
I could imagine cases where this would yield an error, e.g. if I submit a bunch of files that looked like this:
./reads.1.fastq ./reads.2.fastq
and so on...
So how are the names determined?
best,
Remo