RAHenriksen / NGSNGS

NGSNGS: Next generation simulator for next generation sequencing data
46 stars 4 forks source link

single output format #40

Closed fgvieira closed 9 months ago

fgvieira commented 9 months ago

Not sure it is necessary for ngsngs to support so many output formats. It can output only (e.g.) SAM (like bwa), since this format is easily converted to BAM/CRAM (samtools view) or FASTA/Q (samtools fasta/q).

This would simplify the deployment of ngsngs on worflows (as well as its code).

RAHenriksen commented 9 months ago

Hi Filipe,

Thanks for your comment and I appreciate any feedback. I have decided to not remove the different output formats since this functionality fits with the features of other tools, and the formats have already been described in the published article.

The reason for also having fasta and fastq as opposed to using samtools fasta/q, is such that users can directly perform alignment on the simulated dataset instead of using samtools fasta/q of the created sam files and then aligning the sequences.

And yes you're correct that samtools view can convert the files into other formats, but the idea was that since we're using htslib we would remove that need and additional input/output bottleneck which would arise if converting the simulated output from sam/bam/cram, and my experience has been for large enough .sam files it can also be time-consuming to convert the files. Therefore our users can of course provide the -t2 | --threads2 option, which will speed up the compression :-).

Hope this explains our thought process.

Please let me know if you have any other questions / suggestions / comments or anything.

Best, Rasmus