Add ability to classify measurements

GoogleCodeExporter commented 9 years ago

Following up on the intriguing visualization of bio-optical data shown at the 
2014 Ocean Sciences Meeting 
(https://code.google.com/p/stoqs/wiki/Update_27_February_2014) there is a need 
for the STOQS data model to support the recording of classification of specific 
measurements. A classifier would, for instance, mine a dataset and identify 
measurements as associated with specific types of plankton. Once measurements 
have been classified a database could be queried by those classifications. The 
process can be iterative (improving upon the classification scheme) by 
comparing the results against actual laboratory and genetic analysis of Sample 
data.

There is currently no place to store classifications of measurements in the 
STOQS data model. Additional tables need to be added to make this possible. 
There are two possible approaches:

1. Follow the pattern of the Sample table where specific columns store the 
classification information
2. Follow the pattern of the Resource table where name, value, and uristring 
columns store the information in a more abstract, but potentially more flexible 
manner

Original issue reported on code.google.com by MBARIm...@gmail.com on 5 May 2014 at 11:33

GoogleCodeExporter commented 9 years ago

Contributed by msdanellecline:

As input to the discussion, on a high level I think classification data that we 
might include in STOQS could include:

Class 
- Tags
- Labels (binary, multivalued, or continuous)
- Confidence (% confidence, definitely/maybe/guessing)

Annotator
- Properties (user name, role). Here the annotator could be e.g. a person or an 
automated classification process

Original comment by MBARIm...@gmail.com on 13 May 2014 at 4:48

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Intensive updates and access to tagged measurements will be via programmatic 
interfaces, but some access should be supported through the STOQS User 
Interface. A logical place to add a new section to the UI is on the right side 
where items can be selected for addition to the faceted search filter set. 
Currently there is Temporal, Sampled Parameter, Measured Parameter, Parameter 
Values, and Platforms. These sections are direct reflections of portions of the 
STOQS data model. 

What should be the name of the new section from which a user can make 
selections for certain classifications? The name can be general and the section 
organized with tabs or subsections. It should encompass any type of Resource 
from which a user may want to select data (including comments on Sample data, 
for instance).

Original comment by MBARIm...@gmail.com on 13 May 2014 at 8:24

GoogleCodeExporter commented 9 years ago

While considering schema changes we would also want to add a "PlatformResource" 
association table. This is a place where the uristring of X3D models of 
platforms could be stored.

Original comment by MBARIm...@gmail.com on 30 May 2014 at 6:36

GoogleCodeExporter commented 9 years ago

Created branch schemachangeResources for stoqs/models.py changes that will 
affect production servers. Production servers should stay with default branch 
until all schema changes have been tested and there is a window of time to 
reload production databases.

Original comment by MBARIm...@gmail.com on 31 May 2014 at 6:29

Changed state: Started

GoogleCodeExporter commented 9 years ago

In preparation for reloading production databases with the new schema other 
software updates are being made in the schemachangeResources branch, namely:

- Adding Sigma-T abd Spiciness for all measurements that have temperature and 
salinity
- Adding altitude (depth_above_sea_floor) to all trajectory measurements

Together with changes being made to visualize BEDs data there will be a lot to 
merge form this branch!

Original comment by MBARIm...@gmail.com on 4 Jun 2014 at 11:52

GoogleCodeExporter commented 9 years ago

Tested loading with new Sigma-T, Spiciness, and Altitude data with 
loaders/MolecularEcology/loadSIMZ_spring2014.py. Thinking now that a 
SimpleBottomDepthTIme table and a bottomdepth field added to Measurement would 
be nice now that altitude can be added from a bathymetery file. This would 
allow plotting of the bottom depth profile in the Temporal Depth plot.

Original comment by MBARIm...@gmail.com on 6 Jun 2014 at 9:35

GoogleCodeExporter commented 9 years ago

Some ideas of how Classification would be exposed in the User Interface are in 
this white board photo.

A new "Attributes" or "Phenomena" section could contain tabs corresponding to 
different Resource association tables. For example, the Measurement (or 
MeasuredParameter) tab could contain selections for labeled Measurements 
(sediment, dinoflagellates, diatoms,...); sections could be delimited by 
ResourceType in the same way that Platforms are grouped by PlatformType. 
Selections would then be added to the filter for what gets displayed in the UI.

Original comment by MBARIm...@gmail.com on 6 Jun 2014 at 9:45

Attachments:

IMG_0416.JPG

GoogleCodeExporter commented 9 years ago

See corresponding Issue 52 
(https://code.google.com/p/stoqs/issues/detail?id=52) for additional 
information on the specific code changes needed to implement the classification 
capability.

Original comment by MBARIm...@gmail.com on 9 Jun 2014 at 11:06

GoogleCodeExporter commented 9 years ago

The deployed code on kraken has been updated with the schemachangeResources 
branch changesets.

So far, the legacy databases appear to be working fine. Please report any 
errors you see for anything at:

http://kraken.shore.mbari.org/canon/

Original comment by MBARIm...@gmail.com on 12 Jun 2014 at 7:40

GoogleCodeExporter commented 9 years ago

Closing Issue 52 and consolidating comments here. Need to verify that the 
schema changes will help us perform the steps needed to classify measurements. 
The example case is the chl-bb-salinity plot from Issue 52 where we can use 
salinity as a heuristic for plankton and sediment classification:

http://stoqs.googlecode.com/hg/doc/Screenshot_2014-05-01_14.49.15.png

A program will be written to create training and test sets for each range of 
salinity values that we will label thusly:

purple (-33.33, 33.65): diatom
blue (33.65, 33.70): dino1
green (33.70, 33.75): dino2
red (33.75, 33.93+): sediment

The program will add an entry in the Resource table (one with ResourceType 
'train' and one for 'test') for each of these classes and then associate them 
with the set of chl and bb MeasuredParameters by entering records in the new 
MeasuredParameterResource association table.

The STOQS UI will be modified to according to the plan laid out in comment 8 
and we can test by selecting one of the classes and confirming that the UI 
updates appropriately.

Original comment by MBARIm...@gmail.com on 17 Jun 2014 at 3:11

GoogleCodeExporter commented 9 years ago

Instead of labeling MeasuredParameters with names of 'train' or 'test' we will 
label them with the name 'Labeled'. This way a classification train and test 
algorithm can separate those data as needed to test various machine learning 
models. A great tutorial for understanding the technique is at 
https://www.youtube.com/watch?v=4ONBVNm3isI.

Original comment by MBARIm...@gmail.com on 30 Jun 2014 at 7:03

GoogleCodeExporter commented 9 years ago

Ability to label measurements with new schema are demonstrated with this 
example:

1. Execute the new contrib/analysis/classify.py script to add some labeled data 
using the criteria from Comment #11:

(venv-stoqs)[mccann@localhost analysis]$ ./classify.py --doLabel --database 
stoqs_september2013_t --platform dorado --start 20130916T124035 --end 
20130919T233905 --inputs bbp700 fl700_uncorr --discriminator salinity --labels 
diatom dino1 dino2 sediment --mins 33.33 33.65 33.70 33.75 --maxes 33.65 33.70 
33.75 33.93 --clobber -v
Making label 'diatom' with discriminator {'salinity': ('33.33', '33.65')}
  (3824, 3824) MeasuredParameters returned from database stoqs_september2013_t
  Saving 3824 values of 'diatom' with type 'Labeled'
Making label 'dino1' with discriminator {'salinity': ('33.65', '33.70')}
  (9709, 9709) MeasuredParameters returned from database stoqs_september2013_t
  Saving 9709 values of 'dino1' with type 'Labeled'
Making label 'dino2' with discriminator {'salinity': ('33.70', '33.75')}
  (6588, 6588) MeasuredParameters returned from database stoqs_september2013_t
  Saving 6588 values of 'dino2' with type 'Labeled'
Making label 'sediment' with discriminator {'salinity': ('33.75', '33.93')}
  (3380, 3380) MeasuredParameters returned from database stoqs_september2013_t
  Saving 3380 values of 'sediment' with type 'Labeled'

2. View the User Interface and select 'diatom' and 'sediment' in the new 
Attributes section to see those filters added to the ParameterParameter plot in 
the attached image.

Original comment by MBARIm...@gmail.com on 30 Jun 2014 at 11:17

Attachments:

[Screenshot 2014-06-30 16.10.17.png](https://storage.googleapis.com/google-code-attachments/stoqs/issue-49/comment-13/Screenshot 2014-06-30 16.10.17.png)

GoogleCodeExporter commented 9 years ago

The Mercurial branch 'schemachangeResources' has been merged with the default 
and is now inactive. The next step is to reload all of the production databases 
so that we can start using the new features of these code changes.

Original comment by MBARIm...@gmail.com on 30 Jun 2014 at 11:25

GoogleCodeExporter commented 9 years ago

Held meeting with stakeholders on Tuesday 8 July 2014 to give a live 
demonstration of new capabilities shown in Comment #13. The production 
databases are being reloaded and that task should be completed in a week or so. 
Marking this issue as fixed.

Original comment by MBARIm...@gmail.com on 9 Jul 2014 at 4:57

Changed state: Fixed

google-code-export / stoqs

Add ability to classify measurements #49