Magdoll / SQANTI2

SQANTI2 is now replaced by SQANTI3. Please go to: https://github.com/ConesaLab/SQANTI3
Other
38 stars 15 forks source link

About adapting SQANTI2 to process transcripts from nanopore cDNA sequencing #62

Closed yjx1217 closed 4 years ago

yjx1217 commented 4 years ago

Hello,

I am trying to adapt SQANTI2 to process transcripts from nanopore cDNA sequencing. In my first attempt, I directly applied SQANTI2 to my data and SQANTI2 complained about the transcript ID:

Cleaning up isoform IDs...
Invalid input IDs! Expected PB.X.Y or PB.X.Y|xxxxx or PBfusion.X format but saw 97e7be63-e596-4865-8391-b7d8b1d8c79c|14 instead. Abort!

So I renamed the transcript ID in my input gtf file in the format of ON.X.Y where X and Y are artificial numbers. And it seems working fine.

Before trusting the results, I want to double check the following points: 1) Is there any implicit meaning of the PB.X.Y naming convention? For example, if the same gene has two different transcripts, do I need to name them as ON.1.1 and ON.1.2 so that the middle "1" here implies that they belong to the same gene? 2) In the final output files (such as sqanti2 corrected gtf), I noticed that my specified transcript ID was further modified with the first letter "O" trimmed off and in some case also the final digit. Should I worry about this? Also note that if I renamed my transcript ID in my input gtf file in the format of ONT.X.Y, the first letter still got trimmed off.

In the same vein, despite these small problems, it seems OK to adapt SQANTI2 with minimal changes to make it natively support nanopore sequencing data, which will be very helpful. Is this also part of the developers' plan? :-)

Thanks in advance!

Best, Jia-Xing

Magdoll commented 4 years ago

Hi @yjx1217 , There is not specific reason for the PB.X.Y ID format requirement other than ease of processing and I don't want to go down the rabbit hole of supporting every ID format that is out there :-)

that said, sqanti_qc2.py has a hidden parameter that you can use to bypass the PBID requirement. use --force_id_ignore. This is largely untested (again, because I'm trying not to have to support everything under the sun...) so if you run into issues, continue to use this ticket to report back plz.

Thanks, -Liz

yjx1217 commented 4 years ago

Hi Liz,

Great! Good to know the hidden option! :-)

Best, Jia-Xing