Closed jamesboot closed 1 year ago
Could you possibly share a few lines of the input BAMs?
Hi, thanks for your quick reply. I was about to post a few lines of the BAM input file and I realised the problem, there was some file corruption. Not sure how I missed it... I re-ran the deduplication on some corrected BAM files but I'm now getting a new error:
AssertionError: not all umis are the same length(!): 30 - 31
I can see in the output logs that the UMIs are now being detected but looks like they're not the same length for some reason, so I need to revisit my trimming and adding the UMIs to the header.
Apologies for the inconvenience, I'll close this issue!
Hi there,
Thanks for developing the package. We have some sequencing data, single end reads whereby the first 'n' bases are miRNA sequences, we then have a specified adapter sequence, and then after the adapter we have a UMI, after the UMI we have junk bases to the end of the read. To try and process this I have first run cutadapt using the code below - the idea being to take the adapter and UMI sequence and add the matched sequence to the end of the read name using a '' delimiter. I did this as the UMItools documentation says it is expecting a '' delimiter?
A quick peak inside the output fastq files and it looks as though the matched sequence has been added correctly. Therefore I've next run STAR to algin reads to our reference genome, options below:
Finally, I've then tried running UMItools on the output.bam files:
However, I'm getting the following output, which looks like no UMIs are being detected?
Any help in how best to process these custom libraries and perform the UMI deduplication would be brilliant. Many thanks James