Closed GoogleCodeExporter closed 9 years ago
Contributed by msdanellecline:
As input to the discussion, on a high level I think classification data that we
might include in STOQS could include:
Class
- Tags
- Labels (binary, multivalued, or continuous)
- Confidence (% confidence, definitely/maybe/guessing)
Annotator
- Properties (user name, role). Here the annotator could be e.g. a person or an
automated classification process
Original comment by MBARIm...@gmail.com
on 13 May 2014 at 4:48
[deleted comment]
Intensive updates and access to tagged measurements will be via programmatic
interfaces, but some access should be supported through the STOQS User
Interface. A logical place to add a new section to the UI is on the right side
where items can be selected for addition to the faceted search filter set.
Currently there is Temporal, Sampled Parameter, Measured Parameter, Parameter
Values, and Platforms. These sections are direct reflections of portions of the
STOQS data model.
What should be the name of the new section from which a user can make
selections for certain classifications? The name can be general and the section
organized with tabs or subsections. It should encompass any type of Resource
from which a user may want to select data (including comments on Sample data,
for instance).
Original comment by MBARIm...@gmail.com
on 13 May 2014 at 8:24
While considering schema changes we would also want to add a "PlatformResource"
association table. This is a place where the uristring of X3D models of
platforms could be stored.
Original comment by MBARIm...@gmail.com
on 30 May 2014 at 6:36
Created branch schemachangeResources for stoqs/models.py changes that will
affect production servers. Production servers should stay with default branch
until all schema changes have been tested and there is a window of time to
reload production databases.
Original comment by MBARIm...@gmail.com
on 31 May 2014 at 6:29
In preparation for reloading production databases with the new schema other
software updates are being made in the schemachangeResources branch, namely:
- Adding Sigma-T abd Spiciness for all measurements that have temperature and
salinity
- Adding altitude (depth_above_sea_floor) to all trajectory measurements
Together with changes being made to visualize BEDs data there will be a lot to
merge form this branch!
Original comment by MBARIm...@gmail.com
on 4 Jun 2014 at 11:52
Tested loading with new Sigma-T, Spiciness, and Altitude data with
loaders/MolecularEcology/loadSIMZ_spring2014.py. Thinking now that a
SimpleBottomDepthTIme table and a bottomdepth field added to Measurement would
be nice now that altitude can be added from a bathymetery file. This would
allow plotting of the bottom depth profile in the Temporal Depth plot.
Original comment by MBARIm...@gmail.com
on 6 Jun 2014 at 9:35
Some ideas of how Classification would be exposed in the User Interface are in
this white board photo.
A new "Attributes" or "Phenomena" section could contain tabs corresponding to
different Resource association tables. For example, the Measurement (or
MeasuredParameter) tab could contain selections for labeled Measurements
(sediment, dinoflagellates, diatoms,...); sections could be delimited by
ResourceType in the same way that Platforms are grouped by PlatformType.
Selections would then be added to the filter for what gets displayed in the UI.
Original comment by MBARIm...@gmail.com
on 6 Jun 2014 at 9:45
Attachments:
See corresponding Issue 52
(https://code.google.com/p/stoqs/issues/detail?id=52) for additional
information on the specific code changes needed to implement the classification
capability.
Original comment by MBARIm...@gmail.com
on 9 Jun 2014 at 11:06
The deployed code on kraken has been updated with the schemachangeResources
branch changesets.
So far, the legacy databases appear to be working fine. Please report any
errors you see for anything at:
http://kraken.shore.mbari.org/canon/
Original comment by MBARIm...@gmail.com
on 12 Jun 2014 at 7:40
Closing Issue 52 and consolidating comments here. Need to verify that the
schema changes will help us perform the steps needed to classify measurements.
The example case is the chl-bb-salinity plot from Issue 52 where we can use
salinity as a heuristic for plankton and sediment classification:
http://stoqs.googlecode.com/hg/doc/Screenshot_2014-05-01_14.49.15.png
A program will be written to create training and test sets for each range of
salinity values that we will label thusly:
purple (-33.33, 33.65): diatom
blue (33.65, 33.70): dino1
green (33.70, 33.75): dino2
red (33.75, 33.93+): sediment
The program will add an entry in the Resource table (one with ResourceType
'train' and one for 'test') for each of these classes and then associate them
with the set of chl and bb MeasuredParameters by entering records in the new
MeasuredParameterResource association table.
The STOQS UI will be modified to according to the plan laid out in comment 8
and we can test by selecting one of the classes and confirming that the UI
updates appropriately.
Original comment by MBARIm...@gmail.com
on 17 Jun 2014 at 3:11
Instead of labeling MeasuredParameters with names of 'train' or 'test' we will
label them with the name 'Labeled'. This way a classification train and test
algorithm can separate those data as needed to test various machine learning
models. A great tutorial for understanding the technique is at
https://www.youtube.com/watch?v=4ONBVNm3isI.
Original comment by MBARIm...@gmail.com
on 30 Jun 2014 at 7:03
Ability to label measurements with new schema are demonstrated with this
example:
1. Execute the new contrib/analysis/classify.py script to add some labeled data
using the criteria from Comment #11:
(venv-stoqs)[mccann@localhost analysis]$ ./classify.py --doLabel --database
stoqs_september2013_t --platform dorado --start 20130916T124035 --end
20130919T233905 --inputs bbp700 fl700_uncorr --discriminator salinity --labels
diatom dino1 dino2 sediment --mins 33.33 33.65 33.70 33.75 --maxes 33.65 33.70
33.75 33.93 --clobber -v
Making label 'diatom' with discriminator {'salinity': ('33.33', '33.65')}
(3824, 3824) MeasuredParameters returned from database stoqs_september2013_t
Saving 3824 values of 'diatom' with type 'Labeled'
Making label 'dino1' with discriminator {'salinity': ('33.65', '33.70')}
(9709, 9709) MeasuredParameters returned from database stoqs_september2013_t
Saving 9709 values of 'dino1' with type 'Labeled'
Making label 'dino2' with discriminator {'salinity': ('33.70', '33.75')}
(6588, 6588) MeasuredParameters returned from database stoqs_september2013_t
Saving 6588 values of 'dino2' with type 'Labeled'
Making label 'sediment' with discriminator {'salinity': ('33.75', '33.93')}
(3380, 3380) MeasuredParameters returned from database stoqs_september2013_t
Saving 3380 values of 'sediment' with type 'Labeled'
2. View the User Interface and select 'diatom' and 'sediment' in the new
Attributes section to see those filters added to the ParameterParameter plot in
the attached image.
Original comment by MBARIm...@gmail.com
on 30 Jun 2014 at 11:17
Attachments:
The Mercurial branch 'schemachangeResources' has been merged with the default
and is now inactive. The next step is to reload all of the production databases
so that we can start using the new features of these code changes.
Original comment by MBARIm...@gmail.com
on 30 Jun 2014 at 11:25
Held meeting with stakeholders on Tuesday 8 July 2014 to give a live
demonstration of new capabilities shown in Comment #13. The production
databases are being reloaded and that task should be completed in a week or so.
Marking this issue as fixed.
Original comment by MBARIm...@gmail.com
on 9 Jul 2014 at 4:57
Original issue reported on code.google.com by
MBARIm...@gmail.com
on 5 May 2014 at 11:33