Closed spvensko closed 1 month ago
Actually, it appears the contents of the novel
mode are the contents of the refgene
reference with the header from the quantify
mode:
[spvensko@c6145-2-8 d93dc5a2a20de44d772037ba700622]$ head *refgene*
AC090197.1 chr8 + 23336170 23366125 ENST00000519692.1,ENST00000517420.1,
AC090197.1 chr8 + 23336170 23336206 0,1,
AC090197.1 chr8 + 23336207 23337457 1,1,
AC090197.1 chr8 + 23341025 23341149 1,1,
AC090197.1 chr8 + 23341150 23341536 0,1,
AC090197.1 chr8 + 23363125 23366125 1,0,
LINC02774 chr1 + 243917401 244047317 ENST00000440494.1,ENST00000652928.1,
LINC02774 chr1 + 243917401 243917449 1,0,
LINC02774 chr1 + 243917450 243917526 1,1,
LINC02774 chr1 + 243978409 243978519 1,1,
[spvensko@c6145-2-8 d93dc5a2a20de44d772037ba700622]$ head *novel*
GeneName IsoformName ReadPerGene_corrected RelativeAbundance infor_ratio
AC090197.1 chr8 + 23336170 23366125 ENST00000519692.1,ENST00000517420.1,
AC090197.1 chr8 + 23336170 23336206 0,1,
AC090197.1 chr8 + 23336207 23337457 1,1,
AC090197.1 chr8 + 23341025 23341149 1,1,
AC090197.1 chr8 + 23341150 23341536 0,1,
AC090197.1 chr8 + 23363125 23366125 1,0,
LINC02774 chr1 + 243917401 244047317 ENST00000440494.1,ENST00000652928.1,
LINC02774 chr1 + 243917401 243917449 1,0,
LINC02774 chr1 + 243917450 243917526 1,1,
The tool is still running, so perhaps the file will be further modified prior to completion?
Here's how I'm running novel
mode:
liqa -task novel -refgene gencode.v37.annotation.with.hervs.gtf.refgene -bam HCC1395-Pt01-ar-711.sorted.bam -out HCC1395-Pt01-ar-711.liqa_novel_isos -num_cover 20 -num_support_read 10
Yes, this is the correct interpretation of the results, and will address the issue with the header from issue #42 - thanks for pointing that out! The output should be different from the refgene file based on coverage/read parameters that you have provided. Are you still having an issue with that?
I'm trying to better understand the output from the
novel
mode. The description from the usage states:Example output:
Let's consider the
AC090197.1
portion specifically -- it's clear the first line is a "header" line for the gene which lists the gene, chromosome, strand, the start position of all possible splice variants, the end position of all possible splice variants, and the canonical transcripts that are observed with these splice variants.So, for
AC090197.1 chr8 + 23336170 23336206 0,1,
, I believe it's stating that there is an included region atchr8:23336170-23336206
that is observed inENST00000517420.1
, but notENST00000519692.1
.My questions: