BgeeDB / bgee_apps

Source code of the Java Bgee applications
https://bgee.org/
Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

Idea for a rank score of genes in organs #139

Open fbastian opened 4 years ago

fbastian commented 4 years ago

In GitLab by @fbastian on Jul 6, 2016, 21:54

If we wanted to make an "organ page", the most obvious information we would display is the ranked list of genes expressed in that organ, with the genes with highest expression displayed first.

But I think it wouldn't be the best use of our rank scores! A gene could be lowly expressed, but could be essential to the organ function, and only to that organ function. But the gene would be badly ranked according to its expression level.

What I'd rather like to see, is the list of genes for which that organ is the first on their list. Maybe the rank score of that gene in that organ is bad, but still it might be the best rank for this gene. But then, what threshold should we use? Display the genes for which the organ is on their top 5 organs? 10? And how to rank that?

=> suggestion: to ranks genes in an organ, use a relation such as rank_of_the_gene_in_this_organ - lowest_rank_of_the_gene, with some normalization trick to make all genes expressed in the organ comparable.

@marcrr

fbastian commented 4 years ago

In GitLab by @marcrr on Jul 7, 2016, 09:50

I think that it would be good to see both: the highest expressed genes, and the genes for which that organ is first on their list.

I'm not sure that I see the advantage of the relation suggested. In most organs, just taking all the genes for which this organ is first will make too many genes for a readable display.

Maybe instead prioritize genes by the difference in rank (as calculated for the "jumps") between first and second anatomical structure: for a given anatomical structure, show the genes for which this structure is first, in order of the difference of rank score to their next best structure.

fbastian commented 4 years ago

In GitLab by @fbastian on Feb 24, 2017, 12:10

I think I have a nice idea to generate an "organ specificity" rank (remember, for a rank, the lower the better):

organ_specificity_rank = (1 + tissue_specificity)^f * rank
    with: rank = expression rank of the gene is that organ
          tissue_specificity = tissue specificity score of the gene, with 0 = tissue specific and 1 = ubiquitous
          f = arbitrary scaling factor > 1 to make the tissue specificity score to impact the rank even more

(A gene would then have a rank of 1 if it has the highest expression level above all other genes in that organ and is expressed only in that organ)

And then we could let the user decide to rank genes in an organ based either on expression ranks (which will show a lot of house-keeping genes on top), or on "organ specificity" rank (which should show on top the genes for which this organ is the most important).

For some background about this idea: the current direction which @jwollbrett is currently implementing is to display genes based on their expression rank in the organ, but showing at first only the genes for which the organ is in their first expression cluster, and for which their first expression cluster has an number of organs below an arbitrary threshold. It turns out that the latter condition works great, but is only a way to discard ubiquitous genes for showing up.

A solution that is considered is to generate tissue specificity score, for hiding at first ubiquitous genes on organ pages.

I think the solution proposed here, if it works, would be better because it would not hide any information, and it would provide an easy sorting "by column expression rank or by column organ specificity rank".

(And, yes, we would transform these ranks in a "score" somehow, for better clarity for users, and people in general, who can't understand what a rank is, because... I don't know, really)

@marcrr

fbastian commented 4 years ago

In GitLab by @fbastian on Feb 24, 2017, 12:13

@jwollbrett: I forgot about the idea proposed by @marcrr in his previous comment, which is worth trying as well.

edit: it would require to retrieve the no-expression calls as well, to get the rank of the gene in organs where its expression is not above background noise.

edit2: I think the idea of @marcrr could be implemented by using not only the difference between the ranks of the gene in the organ and in the next organ, but also the difference to the gene's best rank over all organs. And, IMO, it should rather use the max rank of the gene among the organs where it is expressed (so, not using no-expression calls).

E.g.: organ_specificity_rank = gene_organ_rank gene_organ_rank /min_gene_rank gene_organ_rank/gene_max_rank (something like that)

fbastian commented 4 years ago

In GitLab by @marcrr on Feb 24, 2017, 15:29

Following live discussion: we will try first methods based on differences of ranks as computed for the gene page. We will try also methods using tissue-specificity, as soon as we have full data to calculate it. It is necessary to have results from different methods to clarify what exactly we expect from an "organ page".

fbastian commented 4 years ago

In GitLab by @marcrr on Feb 28, 2017, 17:46

Following discussion with @jwollbrett: we should decide what we want to see for "whole body" or "embryo" type of anatomical terms. As a user, I would expect to see housekeeping genes which are expressed in embryo or adult, but our algorithm may not recover them. To check.

fbastian commented 4 years ago

In GitLab by @marcrr on Feb 28, 2017, 17:48

Also following discussion with @jwollbrett: it seems a good idea to present 2 or 3 rankings based on different criteria, e.g. really specific to this structure (strict) vs. more broadly specific to a subset of structures.