bede / hostile

Precise host read removal
MIT License
78 stars 4 forks source link

Incorporating Single-End Short Read Data Support #34

Closed Ackia closed 7 months ago

Ackia commented 7 months ago

I want to be able to utilize single-end short-read data in addition to paired-end data

Acceptance Criteria:

  1. As a user, I should be able to specify single-end short-read data as input to the Bowtie2 tool.
  2. The tool should correctly process and align single-end reads using Bowtie2 algorithm.
  3. The tool's documentation should reflect the newly added support for single-end short-read data.
  4. The tool's performance with single-end data should be evaluated and compared to its performance with paired-end data.
  5. The tool should provide appropriate warnings or errors if incompatible data types are provided as input.

By implementing this user story, users can utilize single-end short-read data alongside paired-end data in their genomic analysis workflows, enhancing the tool's utility and accessibility for diverse research needs.

bede commented 7 months ago

Hi Oskar, The good news is Hostile already supports unpaired short read input. Simply specify --aligner bowtie2 for unpaired short reads, as by default unpaired reads are assumed to be long reads.

For example:

hostile clean --fastq1 tests/data/human_1_1.fastq.gz --aligner bowtie2

Within the stderr generated by Hostile, you will see Mode: short read (Bowtie2), as opposed to Mode: paired short read (Bowtie2) when processing paired data, confirming correct behaviour.

The bad news is that you may not consider points 3-5 of your 'Acceptance Criteria' to be satisfied:

  1. While the command line usage section of the README mentions that --aligner can be chosen, this could be better documented. I will consider this.
  2. While unpaired short read functionality in Hostile is unit tested, its performance has not been formally benchmarked, as it is a less common application than either paired short read or long read decontamination, and I lack the time to benchmark everything.
  3. Hostile guesses whether the input contains long or short reads based on whether --fastq2 is set. This is a simple heuristic but opens the possibility of violating user assumptions. However, the Mode section of the stderr generated by Hostile removes any doubt as to which mode is being used in operation.
Ackia commented 7 months ago

Hi Oskar, The good news is Hostile already supports unpaired short read input. Simply specify --aligner bowtie2 for unpaired short reads, as by default unpaired reads are assumed to be long reads.

For example:

hostile clean --fastq1 tests/data/human_1_1.fastq.gz --aligner bowtie2

Within the stderr generated by Hostile, you will see Mode: short read (Bowtie2), as opposed to Mode: paired short read (Bowtie2) when processing paired data, confirming correct behaviour.

The bad news is that you may not consider points 3-5 of your 'Acceptance Criteria' to be satisfied:

  1. While the command line usage section of the README mentions that --aligner can be chosen, this could be better documented. I will consider this.
  2. While unpaired short read functionality in Hostile is unit tested, its performance has not been formally benchmarked, as it is a less common application than either paired short read or long read decontamination, and I lack the time to benchmark everything.
  3. Hostile guesses whether the input contains long or short reads based on whether --fastq2 is set. This is a simple heuristic but opens the possibility of violating user assumptions. However, the Mode section of the stderr generated by Hostile removes any doubt as to which mode is being used in operation.

Great!

When it comes to 4, it can, of course, be skipped for a majority of users. Consider it optional.

With 5, it seems like it should be ok as is, given a bit of improvement on documentation.

And as you already say, 1,2 are done, 3 is just a simple documentation issue.

Great tool! Real useful and very efficient!

bede commented 7 months ago

Ok great, I will think about how to better document this feature in the next release. Thanks for your feedback, and I'm very glad you are finding the tool useful.

Bede

bede commented 7 months ago

Hi Oskar, I added a usage example for unpaired short read data to the readme (https://github.com/bede/hostile/commit/f7891a35e9a919056aab37332f091fad1019abc0), which also explains the default behaviour. Unless I hear more from you I will close this issue. Thanks for raising this issue.

bede commented 7 months ago

Released and mentioned in release notes