Closed ZachLW closed 3 months ago
Hi @ZachLW - sorry for the confusion. The two polyA mean different things here.
In isoseq refine
, the --require-polyA
means it will remove polyA tails (stretches of A) at the 3' end of the FL read. So after isoseq refine, the reads (now called FLNC reads) will be stripped of 5' / 3' cDNA primers and the polyA tail and just be the transcript insert itself.
In pigeon classify
, the --poly-a
is actually looking for polyadenylation signal that is a 6-mer in the genomic region right upstream of the 3' end of a mRNA transcript. This is the paper that best describes it: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC310884/
The common human 6-mer polyA list is here: https://downloads.pacbcloud.com/public/dataset/Kinnex-single-cell-RNA/REF-pigeon_ref_sets/Human_hg38_Gencode_v39/polyA.list.txt
Hope this helps!
Hi all,
Thank you very much for developing these helpful packages! At the 'isoseq refine' step I set the 'require-polyA' parameter thus I guess the refined molecules should have polyA tails. However, at the 'pigeon classify' step I introduced the '--poly-a polyA.list' parameter (polyA.list downloaded from the link provided in the https://isoseq.how/), thus here is a 'polyA motif' column in the transcript annotation file generated by pigeon. I found that there are many NAs in the polyA motif column, so I'm wondering whether this suggest that these transcripts don't have polyA tails or they have non-cannonical polyA tails that are not included in the polyA list provided? If they don't have polyA tails then how did these molecules pass the 'require-polyA' parameter in the 'isoseq refine' step? BTW, I'm also wondering did the name of these molecules, full length tagged non-concatemer reads (FLTNC reads), means they were full length transcripts? Thanks for any help in advance!
Kind regards, Zach