Trait Distribution Plot

laceysanderson commented 5 years ago

We should use true violin plots for quantitative (numeric, continuous) traits (e.g. plant height, days to flowering?).

alt

D3.js implementation: http://bl.ocks.org/asielen/92929960988a8935d907e39e60ea8417

laceysanderson commented 5 years ago

The Chart should also have the following options for the user:

do you want siteyears grouped by location or year?
filter to a specific location or year

laceysanderson commented 5 years ago

Handle different units properly. Specifically, ensure that units are never combined and that the user can select which one they want to see.

laceysanderson commented 5 years ago

Units & methods are now handled properly

laceysanderson commented 5 years ago

Violin plots are complete! Now to move on to qualitative trait plots.

Qualitative trait plots: The Plan

Phase I

Use a grouped bar chart where the x-axis is the categories, the series are the site years and the y-axis is the number of germplasm showing that phenotype. Since there is already a well-tested d3.js chart for this, it will be fast to implement and I think it will be intuitive to users.

Phase II

I would love to make vertical bar charts! These would mimic the layout of the violin plots with sites years on the x-axis, categories on the y-axis and number of germplasm being the length of the bar. This would make the charts less disorienting when switching between traits. It would also be less cluttered and I feel easier to make comparisons between categories. However, I have yet to find such a chart so it would be a labour of love.

laceysanderson commented 5 years ago

How do we tell which traits are qualitative and which are quantitative?

Administrators configure which units are qualitative, with scales automatically being so.
Determine by checking value (if text then qualitative) and unit (if scale then qualitative).

My concern about choice 1 is that it's a lot of configuration for admin since there will likely be many units. However, choice 2 gives them less control.

Plan: option 2

laceysanderson commented 5 years ago

Current materialized view averages replicates which throws an error with qualitative data.

Ideal Solution: add a qualitative property to the unit Current solution: check the unit name for scale

SELECT
  o.genus         AS organism_genus,
  trait.cvterm_id AS trait_id,
  trait.name      AS trait_name,
  proj.project_id AS project_id,
  proj.name       AS project_name,
  method.cvterm_id AS method_id,
  method.name      AS method_name,
  unit.cvterm_id   AS unit_id,
  unit.name        AS unit_name,
  loc.value       AS location,
  yr.value        AS year,
  s.stock_id      AS germplasm_id,
  s.name          AS germplasm_name,
  CASE
   WHEN unit.name~'scale' THEN array_to_string(array_agg(DISTINCT p.value),'/')
   ELSE CAST(avg(p.value::float) as text)
   END AS mean
FROM chado.phenotype p
  LEFT JOIN chado.cvterm trait ON trait.cvterm_id=p.attr_id
  LEFT JOIN chado.project proj USING(project_id)
  LEFT JOIN chado.cvterm method ON method.cvterm_id=p.assay_id
  LEFT JOIN chado.cvterm unit ON unit.cvterm_id=p.unit_id
  LEFT JOIN chado.stock s USING(stock_id)
  LEFT JOIN chado.organism o ON o.organism_id=s.organism_id
  LEFT JOIN chado.phenotypeprop loc ON loc.phenotype_id=p.phenotype_id AND loc.type_id = 2940
  LEFT JOIN chado.phenotypeprop yr ON yr.phenotype_id=p.phenotype_id AND yr.type_id = 141
GROUP BY
  trait.cvterm_id,
  trait.name,
  proj.project_id,
  proj.name,
  method.cvterm_id,
  method.name,
  unit.cvterm_id,
  unit.name,
  loc.value,
  yr.value,
  s.stock_id,
  s.name,
  o.genus;

laceysanderson commented 5 years ago

Switched to storing the values as JSONB in the materialized view and then calculating the mean in the JSON callback. This provides much more flexibility in how to calculate the value for qualitative traits.

Current ToDo list:

[x] Alter materialized view query to store values as JSONB
[x] Alter json callback to find the "mean"
- [x] average for quantitative data
- [x] find unique values and combine (e.g. R/Y)
- [x] qualitative vs quantitative determined via unit property
[x] violin plots continue to work
[x] Check property to determine chart type to use on plot page
[x] When saving traits determine whether a trait is qualitative/quantitative
[x] Develop the qualitative bar chart.

laceysanderson commented 5 years ago

[x] Scales should be sorted on barchart
[x] Duplicated traits in Describe stage of upload (e.g. Lodging listed 2X but only in file once).
[x] Chart figure legend shorter then chart.
[x] Figure legend and module link refer to violin plot

NOTE: Requires you start fresh with data

laceysanderson commented 5 years ago

Moved qualitative chart into new issue.

UofS-Pulse-Binfo / analyzedphenotypes