Closed fbastian closed 4 years ago
In GitLab by @fbastian on Jul 14, 2015, 18:01
It would be good to mock the JSON data supposed to be returned, for clearly defining them, and for not waiting for the features to be implemented in Bgee.
In GitLab by @fbastian on Jul 14, 2015, 18:44
Actually, I see two or three workflows:
What do you think, do you agree with those definitions? Should we tackle each one of them successively?
In GitLab by @sduvaud on Jul 15, 2015, 13:37
Gene verification/species detection: should be dynamic. When a user adds Ensembl gene identifiers, we should check:
Obviously, we wont request the Bgee web service each time a user loads a list of genes in TopAnat. We have to achieve this check on the client's side.
My first idea was to use the $html service in order to fetch, parse and assign the list of ensembl gene identifiers' prefixes with their corresponding species in JSON format. The problem is the origin of the JSON data. It is either we get it from the server on startup or we add it to the Bgee angular project. The latter solution is not convenient for Frederic because this supposes to add a step at each release of Bgee.
We decided to try the following: Frederic will add a JSON snippet in a script tag during the HTML generation. I should check whether Angular can deal with this solution (and how).
In GitLab by @sduvaud on Jul 15, 2015, 15:50
The retrieve/render results workflow: Story board (see attached file).
The parameters are of 3 types:
We will ignore the graph, for now.
The parameters from the "analysis" group are the following:
Although those are very important for the computation of data, they won't appear in the result table itself (but somewhere in a "Parameter summary" panel or whatever). Do they need to be part of the JSON returned data? For now, I don't think so.
The parameters from the "data" category are composed of:
Dependencies:
The result table, which will be built from the returned JSON data, should contain:
Possible JSON output: (no query filter)
{ "EnsemblId": "ENS0000000000", "GeneName": "myGene", "DevelStageId": "HsapDv:0000092", "DevelStageName": "human adult stage (human)", "AnatId": "CL:0000655", "AnatName": "secondary oocyte", "DataType": "RNA-seq", "Expression": "Absent" "DiffExpression": "over-expressed" },
The parameters panel will display:
In GitLab by @fbastian on Jul 15, 2015, 17:08
Some comments:
We will ignore the graph, for now.
Actually, no, we were just not sure whether this parameter should be put directly into the "analysis" parameters. We already use this parameter in our prototype, to generate a ugly graph, so it would be used even if cytoscape is not used to generate a graph for now.
the analysis type (RNA-seq, Affymetrix, in situ hybridization, EST - query filter)
This is more what we call "data type".
anatomical structure (ontology - query filter).
I don't see such a parameter in the storyboard.
The result table, which will be built from the returned JSON data, should contain:
The result table won't contain gene IDs. See the example output files described in issue #12. The results are about retrieving organs (your point 4), not about retrieving genes.
It is true that we will add some "analysis" parameters in the returned results (developmental stage ID and name, expression type), as compared to the example output in issue #12.
About expression type, in the returned results, it won't be 'Expression (y/n)' and 'Differential expression (+/-)', it will be 'expression type (presence/differential expression)' (as the parameters in the form).
Point 5 'analysis type', I think you're speaking about 'data type', this will not be in the returned results, it will be in the "parameters panel".
In the comments of the storyboard (slide 7), the column in the returned results were defined as:
Uberon ID Uberon name Stage ID Stage name Expression type obs. exp. enrich. p-val FDR
In GitLab by @sduvaud on Jul 15, 2015, 17:40
Take-home message: we work at the level of a gene list!!! What we want to know is whether a list of genes is expressed in specific organs. This was unclear for me.
Current output in the prototype (as described in #12):
OrganId OrganName Annotated Significant Expected foldEnrichment p fdr
XAO:0000305 cranial placode 5 3 0.01 300.00 1.04e-07 7.63e-07
XAO:0003196 olfactory system 18 5 0.04 125.00 1.10e-05 5.04e-05
As Frederic said, we will add the developmental stage IDs+name together with the expression type (presence/differential expression).
I will start with this format and use a fake JSON as a result output.
In GitLab by @fbastian on Jul 15, 2015, 17:52
This sounds good!
(point of detail: it's not exactly whether a list of genes is expressed in specific organs, but whether their expression is enriched as compared to the background in specific organs)
In GitLab by @sduvaud on Jul 23, 2015, 14:33
I will add the analysis type to the JSON for the view-by analysis view (the alternative of the default "grouped result view").
I presume that the "AnalysisType" file contain "RNA-seq", "In situ...", "EST", ... but I am not 100% sure. I will start with that and will check with Frederic once we are all back.
JSON used for mocking the application:
{
"AnalysisType": "",
"OrganId": "",
"OrganName": "",
"Annotated": ,
"Significant": ,
"Expected": ,
"foldEnrichment": ,
"p": ,
"fdr": ,
"DevelopmentStage": "",
"DevelopmentStageName": "",
"ExpressionType": ""
},
In GitLab by @sduvaud on Jul 24, 2015, 14:24
The first version of the web interface was pushed on gitlab: https://gitlab.isb-sib.ch/ST/topanat-web
Features:
Missing:
In GitLab by @fbastian on Aug 17, 2015, 09:51
The link to the first version is broken :(
I guess that by AnalysisType
, you're referring to the display of results "per analysis". An "analysis" corresponds to an expression type ('presence', 'diff expression') for a specific developmental stage. So I think you don't need a specific column for this.
Also, for consistency with the current Bgee application, you should use the term 'anatEntity' rather than 'organ' (anatEntityId
, anatEntityName
). Same for 'DevelopmentStage', replace it with 'devStage' (devStageId
, devStageName
). We'll see later which term we use in column headers.
In GitLab by @fbastian on Aug 31, 2015, 14:19
I generated an output TSV file with much more terms (2,300), to test the performances for generating/sorting/searching results client-side (e.g., with data-table): http://devbgee.unil.ch/bgee/TopOBOFiles/results/topOBOResult_8bc083c9ad40e2aa2e02e681801870906d1b41ec.tsv
Note that we should never have so many results (we could easily limit to, e.g., 500 terms per analysis, or less).
You can manipulate the results at the URL: http://devbgee.unil.ch/bgee/bgee?page=top_anat&data=9c067119742baa038c83c648b588c19f6450ed1e
(These data will certainly be removed some times in the future, as they were generated with non-sense parameters, to get lots of terms)
In GitLab by @vioannid on Jul 14, 2015, 17:17
Prepare an end-to-end workflow as described below:
Requirements: function call(s) to Bgee server to get the output data