StevenWingett / HiCUP

Hi-C data processing pipeline
GNU Lesser General Public License v3.0
31 stars 11 forks source link

Truncation at very start of sequence. #62

Open StephenRicher opened 2 years ago

StephenRicher commented 2 years ago

Hi,

Hope you're well. I just noticed this and wasn't sure if this was expected behaviour or not. It appears that when a ligation sequence appears at the very start of a sequence it is truncated to it's first letter. But when it is in the middle of the sequence it is truncated up to the cut site. I don't imagine this will have any impact since such short reads probably wouldn't make it past the mapping stage but thought I'd post it anyway.

Hopefully the toy example explains what I mean. Each sequence in R1 contains a ligation sequence GATCGATC, but after truncation one ends in G, the other in GATC.

Thanks, Stephen

image

StevenWingett commented 2 years ago

Hi Stephen,

I trust all is going well at Bath. You are correct that since in either case this will generate short reads it will not affect the results. But I'll check what is going on.

All the best, Steven