CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
472 stars 188 forks source link

Wrong UMI extracted using Regex #601

Closed NordinZandhuis closed 11 months ago

NordinZandhuis commented 11 months ago

Hi everyone,

I am trying to extract UMIs using the regex method. The first 10 bases of the read are the UMI.

I am using the following command (according to the UMI-tools docs): umi_tools extract --extract-method=regex --stdin=trimmed_output_SRR12925922.fastq.gz --bc-pattern='^(?P.{10})' --log=trimmed_output_SRR12925922_processed.log --stdout=processed_trimmed_output_SRR12925922.fastq.gz

I get the following results for a given read: @SRR12925922.1_ACAACNAGAC 1/1 TCGGAAGAGCACACGTCTGAACTCCAGTCACATTCAGAAATCTCGTATGCCGTCTTCTGCTTT + EEAEEEEEAEEEAEEAE//AEE/AE/E/AAEE6AEEAEE6/6/A/EEE//E/EE/E//E<AE/

The UMI sequence extracted by umi_tools doesn't seem to match the first 10 bases of the read.

Does someone have a suggestion what might be wrong?

Many thanks for any help!

Kind regards,

Nordin