BgeeDB / bgee_apps

Source code of the Java Bgee applications
https://www.bgee.org/
Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

Prepare end-to-end small workflow #13

Closed fbastian closed 4 years ago

fbastian commented 4 years ago

In GitLab by @vioannid on Jul 14, 2015, 17:17

Prepare an end-to-end workflow as described below:

Requirements: function call(s) to Bgee server to get the output data

fbastian commented 4 years ago

In GitLab by @fbastian on Jul 14, 2015, 18:01

It would be good to mock the JSON data supposed to be returned, for clearly defining them, and for not waiting for the features to be implemented in Bgee.

fbastian commented 4 years ago

In GitLab by @fbastian on Jul 14, 2015, 18:44

Actually, I see two or three workflows:

What do you think, do you agree with those definitions? Should we tackle each one of them successively?

fbastian commented 4 years ago

In GitLab by @sduvaud on Jul 15, 2015, 13:37

Gene verification/species detection: should be dynamic. When a user adds Ensembl gene identifiers, we should check:

  1. whether the identifier is of the right format
  2. whether it corresponds to a species in Bgee
  3. whether there are one or many species corresponding to the Ensembl gene identifiers
  4. choose which species should be analysed.

Obviously, we wont request the Bgee web service each time a user loads a list of genes in TopAnat. We have to achieve this check on the client's side.

My first idea was to use the $html service in order to fetch, parse and assign the list of ensembl gene identifiers' prefixes with their corresponding species in JSON format. The problem is the origin of the JSON data. It is either we get it from the server on startup or we add it to the Bgee angular project. The latter solution is not convenient for Frederic because this supposes to add a step at each release of Bgee.

We decided to try the following: Frederic will add a JSON snippet in a script tag during the HTML generation. I should check whether Angular can deal with this solution (and how).

fbastian commented 4 years ago

In GitLab by @sduvaud on Jul 15, 2015, 15:50

The retrieve/render results workflow: Story board (see attached file).

The parameters are of 3 types:

  1. data
  2. analysis
  3. graph

We will ignore the graph, for now.

The parameters from the "analysis" group are the following:

  1. decorrelation type
  2. numerical options
  3. foreground (genome or gene list)

Although those are very important for the computation of data, they won't appear in the result table itself (but somewhere in a "Parameter summary" panel or whatever). Do they need to be part of the JSON returned data? For now, I don't think so.

The parameters from the "data" category are composed of:

  1. the gene list (input data) + species
  2. the expression type (presence/diff - query filter)
  3. the quality level flag (all/high - query filter)
  4. the analysis type (RNA-seq, Affymetrix, in situ hybridization, EST - query filter)
  5. development stage (ontology - query filter)
  6. anatomical structure (ontology - query filter).

Dependencies:

  1. expression types <-> analysis types (beware: incompatibilities!)
  2. development stage <-> input data (more precisely, species detected)
  3. foreground <-> background

The result table, which will be built from the returned JSON data, should contain:

  1. Gene identifier (linked to Ensembl)
  2. Gene name/family (?)
  3. Development stage (proper name + link to ontology)
  4. Anatomy (proper name + link to ontology)
  5. Analysis type
  6. Expression (y/n)
  7. Differential expression (+/-)

Possible JSON output: (no query filter)

{ "EnsemblId": "ENS0000000000", "GeneName": "myGene", "DevelStageId": "HsapDv:0000092", "DevelStageName": "human adult stage (human)", "AnatId": "CL:0000655", "AnatName": "secondary oocyte", "DataType": "RNA-seq", "Expression": "Absent" "DiffExpression": "over-expressed" },

The parameters panel will display:

  1. the species of interest
  2. the foreground
  3. the quality level
  4. the decorrelation type
  5. the numerical options storyboard_Bgee_TopAnat_WebTeam_Jun2015.pptx
fbastian commented 4 years ago

In GitLab by @fbastian on Jul 15, 2015, 17:08

Some comments:

We will ignore the graph, for now.

Actually, no, we were just not sure whether this parameter should be put directly into the "analysis" parameters. We already use this parameter in our prototype, to generate a ugly graph, so it would be used even if cytoscape is not used to generate a graph for now.

the analysis type (RNA-seq, Affymetrix, in situ hybridization, EST - query filter)

This is more what we call "data type".

anatomical structure (ontology - query filter).

I don't see such a parameter in the storyboard.

The result table, which will be built from the returned JSON data, should contain:

Uberon ID   Uberon name Stage ID    Stage name  Expression type obs.    exp.    enrich. p-val   FDR
fbastian commented 4 years ago

In GitLab by @sduvaud on Jul 15, 2015, 17:40

Take-home message: we work at the level of a gene list!!! What we want to know is whether a list of genes is expressed in specific organs. This was unclear for me.

Current output in the prototype (as described in #12):

OrganId OrganName   Annotated   Significant Expected    foldEnrichment  p   fdr
XAO:0000305 cranial placode 5   3   0.01    300.00  1.04e-07    7.63e-07
XAO:0003196 olfactory system    18  5   0.04    125.00  1.10e-05    5.04e-05

As Frederic said, we will add the developmental stage IDs+name together with the expression type (presence/differential expression).

I will start with this format and use a fake JSON as a result output.

fbastian commented 4 years ago

In GitLab by @fbastian on Jul 15, 2015, 17:52

This sounds good!

(point of detail: it's not exactly whether a list of genes is expressed in specific organs, but whether their expression is enriched as compared to the background in specific organs)

fbastian commented 4 years ago

In GitLab by @sduvaud on Jul 23, 2015, 14:33

I will add the analysis type to the JSON for the view-by analysis view (the alternative of the default "grouped result view").

I presume that the "AnalysisType" file contain "RNA-seq", "In situ...", "EST", ... but I am not 100% sure. I will start with that and will check with Frederic once we are all back.

JSON used for mocking the application:

{
"AnalysisType": "",
"OrganId": "",
"OrganName": "",
"Annotated": ,
"Significant": ,
"Expected": ,
"foldEnrichment": ,
"p": ,
"fdr": ,
"DevelopmentStage": "",
"DevelopmentStageName": "",
"ExpressionType": ""
},
fbastian commented 4 years ago

In GitLab by @sduvaud on Jul 24, 2015, 14:24

The first version of the web interface was pushed on gitlab: https://gitlab.isb-sib.ch/ST/topanat-web

Features:

Missing:

fbastian commented 4 years ago

In GitLab by @fbastian on Aug 17, 2015, 09:51

The link to the first version is broken :(

I guess that by AnalysisType, you're referring to the display of results "per analysis". An "analysis" corresponds to an expression type ('presence', 'diff expression') for a specific developmental stage. So I think you don't need a specific column for this.

Also, for consistency with the current Bgee application, you should use the term 'anatEntity' rather than 'organ' (anatEntityId, anatEntityName). Same for 'DevelopmentStage', replace it with 'devStage' (devStageId, devStageName). We'll see later which term we use in column headers.

fbastian commented 4 years ago

In GitLab by @fbastian on Aug 31, 2015, 14:19

I generated an output TSV file with much more terms (2,300), to test the performances for generating/sorting/searching results client-side (e.g., with data-table): http://devbgee.unil.ch/bgee/TopOBOFiles/results/topOBOResult_8bc083c9ad40e2aa2e02e681801870906d1b41ec.tsv

Note that we should never have so many results (we could easily limit to, e.g., 500 terms per analysis, or less).

You can manipulate the results at the URL: http://devbgee.unil.ch/bgee/bgee?page=top_anat&data=9c067119742baa038c83c648b588c19f6450ed1e

(These data will certainly be removed some times in the future, as they were generated with non-sense parameters, to get lots of terms)