Can Longbow segment libraries using alternatives to the 10X 5' and 3' sequences?

claumer commented 2 years ago

Hello,

I'm interested in potentially using MAS-Iso-Seq for a project - the increase in throughput is impressive! - but for my specific application, it would be beneficial to be able to use some alternative adapter structures to those given in the paper. In particular, I'd like to keep the 15 MAS sequences, but replace the 3' end of the PCR primers to use, in place of the 10X 5' and 3' adapter sequences (AAGCAGTGGTATCAACGCAGAG and CTACACGACGCTCTTCCGATCT respectively), using the Illumina P5 and P7 sequences (AATGATACGGCGACCACCGAGATCTACAC and CAAGCAGAAGACGGCATACGAGAT).

In this way, I should be able to use a MAS design to concatenate pre-indexed Illumina libraries and sequence them on long read platforms, if I understand things correctly. (This is a sensible thing to do in my specific context, I promise.)

However, it's unclear to me after a quick browse of the Longbow documentation whether this would be supported on the downstream analysis side. I guess I would need to make a custom LibraryModel.json? Is there any guidance you can give on how to do this?

Regards, Chris L

kvg commented 1 year ago

Hi Chris, Thanks for your inquiry and the separate discussion we had over email. I'm repeating the contents of my email here so that other users may benefit.

Longbow, does indeed support the specification of custom annotation models. There’s also some new code that separates the array and cDNA models more cleanly, which would allow you to retain the MAS15 array model while replacing the other tags with the Illumina tags as you’ve suggested.

I’ve written a short tutorial explaining how the custom model feature of Longbow works and some guidance on how to make/test your own models. You can find it at:

https://broadinstitute.github.io/longbow/custom_models.html

Hopefully that tutorial is easy to follow test out on a synthetic read (there’s an example in there as well as to how to make such a read for the purposes of testing whether your model does the right thing). Since this document is very new, I’d appreciate any feedback you can give on its effectiveness.

Happy model-making!

cnk113 commented 1 year ago

Hello,

I was wondering if it's possible to apply the HMM models using non arrayed data? Specifically I just want to be able segment out the structure: 5p, CBC, UMI, cDNA, 3p etc. I was thinking of using a constant sequence upstream of the 5p and call it "A" and then "B" for a sequence downstream of 3p... Would love to get your thoughts on this.

Best, Chang

broadinstitute / longbow

Can Longbow segment libraries using alternatives to the 10X 5' and 3' sequences? #193