Closed yjx1217 closed 4 years ago
Hi @yjx1217 ,
There is not specific reason for the PB.X.Y
ID format requirement other than ease of processing and I don't want to go down the rabbit hole of supporting every ID format that is out there :-)
that said, sqanti_qc2.py
has a hidden parameter that you can use to bypass the PBID requirement. use --force_id_ignore
. This is largely untested (again, because I'm trying not to have to support everything under the sun...) so if you run into issues, continue to use this ticket to report back plz.
Thanks, -Liz
Hi Liz,
Great! Good to know the hidden option! :-)
Best, Jia-Xing
Hello,
I am trying to adapt SQANTI2 to process transcripts from nanopore cDNA sequencing. In my first attempt, I directly applied SQANTI2 to my data and SQANTI2 complained about the transcript ID:
So I renamed the transcript ID in my input gtf file in the format of ON.X.Y where X and Y are artificial numbers. And it seems working fine.
Before trusting the results, I want to double check the following points: 1) Is there any implicit meaning of the PB.X.Y naming convention? For example, if the same gene has two different transcripts, do I need to name them as ON.1.1 and ON.1.2 so that the middle "1" here implies that they belong to the same gene? 2) In the final output files (such as
sqanti2 corrected gtf
), I noticed that my specified transcript ID was further modified with the first letter "O" trimmed off and in some case also the final digit. Should I worry about this? Also note that if I renamed my transcript ID in my input gtf file in the format of ONT.X.Y, the first letter still got trimmed off.In the same vein, despite these small problems, it seems OK to adapt SQANTI2 with minimal changes to make it natively support nanopore sequencing data, which will be very helpful. Is this also part of the developers' plan? :-)
Thanks in advance!
Best, Jia-Xing