BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
203 stars 69 forks source link

"Superset isoform" definition in flair filter_collapsed_isoforms.py #244

Open mariachiaragrieco opened 1 year ago

mariachiaragrieco commented 1 year ago

Hi,

I'm re-doing some analysis using the newest FLAIR version that I have done before using version 1.5, noticing that the number of reconstructed isoforms decreases.

In this light, I have a question regarding the definition of "superset isoform", used in the flair collapse step (specifically, in the filter_collapsed_isoforms.py subprogram ).

As explained here #210 , I understood that:

What I have not clear yet regards what is meant by 'superset isoform'. Could you enlighten me?

Thank you very much.

Kindly,

Mariachiara

Jeltje commented 1 year ago

The superset isoform is the longest isoform of a set. Say numbers represent exons, then 2-3-4-5 3-4 3-4-5 are all part of the same set. The superset isoform here is 2-3-4-5. 2-3-5 is not part of that set, because it cannot be merged with the isoforms above.

If that's clear, please close this ticket. If not, leave a comment. Thanks!

mariachiaragrieco commented 1 year ago

Hi @Jeltje , thanks for your kind reply. I understood now what the superset isoform represents. I have two questions regarding how the filter is carried out. Let's say we have a set composed of isoforms having exons: A) 2-3-4-5 B) 3-4 C) 3-4-5

A will be the superset isoform.

So, I was wondering: 1) Can the superset be composed of more than one isoform?

2) In computing the average (old FLAIR release) or the maximum value (new FLAIR release) for the filtering threshold, will be considered only the isoforms matching the A structure?

Thanks in advance for you availability.

Kindly,

Mariachiara