OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.87k stars 332 forks source link

Feature Request: UMI Join #529

Open trahsemaj opened 11 months ago

trahsemaj commented 11 months ago

Some tools that work with extracted UMIs expect a specific delimiter between umi1 and umi2 (assuming each read has its own UMI). E.g. fgbio CopyUmiFromReadName (https://fulcrumgenomics.github.io/fgbio/tools/latest/CopyUmiFromReadName.html) expects something like $read_name:ACTG-CCGA . Currently it seems like fastp can only output $read_name:ACTG_CCGA type formats. This can be shifted after the fastp run, but this shifting gets complex if multiple compressed parts are being written.

Request is a for a now CLI arg --umijoin that will define the character placed between the UMIs in read1 and read2. The default value would remain '' but could be adjusted to '-' or '+' for better downstream compatibility.

xiechangxiao commented 1 month ago

+1 . Can you add this feature?@sfchen