Closed Ackia closed 7 months ago
Hi Oskar,
The good news is Hostile already supports unpaired short read input. Simply specify --aligner bowtie2
for unpaired short reads, as by default unpaired reads are assumed to be long reads.
For example:
hostile clean --fastq1 tests/data/human_1_1.fastq.gz --aligner bowtie2
Within the stderr generated by Hostile, you will see Mode: short read (Bowtie2)
, as opposed to Mode: paired short read (Bowtie2)
when processing paired data, confirming correct behaviour.
The bad news is that you may not consider points 3-5 of your 'Acceptance Criteria' to be satisfied:
--aligner
can be chosen, this could be better documented. I will consider this.--fastq2
is set. This is a simple heuristic but opens the possibility of violating user assumptions. However, the Mode
section of the stderr generated by Hostile removes any doubt as to which mode is being used in operation.Hi Oskar, The good news is Hostile already supports unpaired short read input. Simply specify
--aligner bowtie2
for unpaired short reads, as by default unpaired reads are assumed to be long reads.For example:
hostile clean --fastq1 tests/data/human_1_1.fastq.gz --aligner bowtie2
Within the stderr generated by Hostile, you will see
Mode: short read (Bowtie2)
, as opposed toMode: paired short read (Bowtie2)
when processing paired data, confirming correct behaviour.The bad news is that you may not consider points 3-5 of your 'Acceptance Criteria' to be satisfied:
- While the command line usage section of the README mentions that
--aligner
can be chosen, this could be better documented. I will consider this.- While unpaired short read functionality in Hostile is unit tested, its performance has not been formally benchmarked, as it is a less common application than either paired short read or long read decontamination, and I lack the time to benchmark everything.
- Hostile guesses whether the input contains long or short reads based on whether
--fastq2
is set. This is a simple heuristic but opens the possibility of violating user assumptions. However, theMode
section of the stderr generated by Hostile removes any doubt as to which mode is being used in operation.
Great!
When it comes to 4, it can, of course, be skipped for a majority of users. Consider it optional.
With 5, it seems like it should be ok as is, given a bit of improvement on documentation.
And as you already say, 1,2 are done, 3 is just a simple documentation issue.
Great tool! Real useful and very efficient!
Ok great, I will think about how to better document this feature in the next release. Thanks for your feedback, and I'm very glad you are finding the tool useful.
Bede
Hi Oskar, I added a usage example for unpaired short read data to the readme (https://github.com/bede/hostile/commit/f7891a35e9a919056aab37332f091fad1019abc0), which also explains the default behaviour. Unless I hear more from you I will close this issue. Thanks for raising this issue.
Released and mentioned in release notes
I want to be able to utilize single-end short-read data in addition to paired-end data
Acceptance Criteria:
By implementing this user story, users can utilize single-end short-read data alongside paired-end data in their genomic analysis workflows, enhancing the tool's utility and accessibility for diverse research needs.