Closed fbastian closed 4 years ago
In GitLab by @marcrr on Oct 15, 2015, 17:10
Discussion of 15.10:
later we will also do:
Attached: photo of white board of discussion.
In GitLab by @fbastian on Oct 16, 2015, 24:34
Need more information, @marcrr @pmoret (I've made some edits to my original comment)
expression
table, e.g., affyMeanRank
, rnaSeqMeanRank
, etc. But keep in mind that the expression
table stores information per condition (= organ-stage), not per organ.
expression
table is that it won't require new DAOs / new Services on the application side. And the rank for each data type could be stored by CallData
objects, we made a great design :pexpression
table, and add a column to the tables affymetrixProbeset
, rnaSeqResult
, etc, to store the rank of each gene in a given chip/library. It would then be really easy to re-compute the "real" mean rank of a gene in a condition. But, if we want to display this information of "real" mean rank, then it doesn't make sense performance-wise to always re-compute this. Do we want to display this information of mean rank per data type?expression
table as mentioned in previous point, where each row is a condition.
expression
table). Then having the stages for each organ displayed somehow. How do we deal with ranks of genes with identical expression values? Same rank? Averaging their rank?
So, for in situ data, we are going to have all genes with a rank of 1, the only difference between conditions arising from the "normalization"?
In GitLab by @marcrr on Oct 19, 2015, 17:36
Ranks and technical replicates:
In my opinion average all replicates, technical or biological, before computing rank
Shouldn't we also "normalize" ranks, inside a given data type, between conditions/chips/libraries?
I would say between array types certainly yes.
Afterwards it gets fuzzy: different rnaseq coverages? There is no distinction to be made based on quality here I guess
I agree no distinction here on quality
Do we want to display this information of mean rank per data type?
"real rank" you mean before normalization? We may want to display this one day, but not yet, in my opinion.
Should we make the ranking per condition (= organ-stage, maybe soon organ-stage-sex) rather than simply per organ?
I agree in principle.
Or should we have more or less a table per broad stage?
makes sense for me
Should we always display conditions with over-expression first?
In my opinion over/under expression should not be taken into account here, and we should eventually visualize also top/all over and under expression on gene page.
What should we do with no-expression calls?
in download file only
How do we deal with ranks of genes with identical expression values?
should be taken care of by R function of ranks = average rank
So, for in situ data, we are going to have all genes with a rank of 1
yes
it should be made clear that what we are going to display are the expression calls. The mean rank is simply a way of ordering by what we hope to be "biological significance". It is not the main info to be displayed. Agreed?
agreed (Edited on ipad, sorry for formatting)
In GitLab by @fbastian on Oct 21, 2015, 15:14
In my opinion average all replicates, technical or biological, before computing rank
By "replicates", you mean, all samples from a same experiment in a same condition? Or all samples in a given condition? I would agree in the former case, it makes sense and it's easy to do.
I would say between array types certainly yes. Afterwards it gets fuzzy: different rnaseq coverages?
Yes, like RNA-Seq libraries targeting sRNAs. But OK, for now, let's consider that EST and RNA-Seq libraries always have access to the complete genome.
TODO: store in database the information about libraries targeting special types of RNAs, and "normalize" ranks only for those libraries.
should be taken care of by R function of ranks = average rank
We use perl :p
In GitLab by @marcrr on Oct 26, 2015, 11:17
By "replicates", you mean, all samples from a same experiment in a same condition?
yes.
In GitLab by @marcrr on Jan 13, 2016, 15:09
following discussion of today:
In GitLab by @fbastian on Mar 1, 2016, 02:06
I think the computation of globalMeanRank
in org.bgee.model.dao.mysql.expressiondata.MySQLExpressionCallDAO#generateSelectClause
shouild be slightly rewritten: the denominator should be based on whether data exists for the gene, rather than whether a max rank exists for a condition.
E.g., current denominator is written: .../ (if (expression.estMaxRank is null, 0,expression.estMaxRank )+...
. It should be rewritten to something like: .../ (if (expression.estData = 'no data', 0,expression.estMaxRank )+...
.
Same should be apply to numerator for consistency, e.g.: ...if (estData = 'no data', 0,expression.estMeanRankNorm * expression.estMaxRank)+
In GitLab by @fbastian on Mar 1, 2016, 02:24
And bonus point if you manage to get the quantile information to be displayed :p
In GitLab by @marcrr on Oct 15, 2015, 12:07
One page per gene presenting synthetic information on expression etc, as well as gene-specific download files, and external links.