google-code-export / stoqs

Automatically exported from code.google.com/p/stoqs
GNU General Public License v3.0
1 stars 1 forks source link

Add ability to load MBARI ROVCTD data #54

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The primary method to load data into a STOQS database is first first convert 
data to CF-NetCDF 1.6 Discrete Sampling Geometry, host on an OPeNDAP server and 
then use those URLs in a load script. 

Many important observational data archives are not in this format. One such 
archive is MBARI's Remote Operated Vehicle Conductivity Temperature Depth 
(ROVCTD) database. Fortunately, there is an internal HTTP interface to this 
25-year data archive that returns ROVCTD trajectory data as comma separated 
values. 

The goal of this Issue is to modify the STOQS loader software to support 
loading of MBARI ROVCTD data. The changes should be done in such a way as to 
take advantage of all the well tested software that has been developed for 
loading data from OPeNDAP data sources.

Original issue reported on code.google.com by MBARIm...@gmail.com on 27 Oct 2014 at 3:36

GoogleCodeExporter commented 9 years ago
Thanks for the explanation, Mike.

Original comment by resherl...@gmail.com on 27 Oct 2014 at 6:11

GoogleCodeExporter commented 9 years ago
Added file loaders/ROVCTDloader.py to go against Rich's rovctddataservlet. It 
looks as though many of the methods in DAPloaders.py can be reused, which is a 
good thing.

Original comment by MBARIm...@gmail.com on 29 Oct 2014 at 3:53

GoogleCodeExporter commented 9 years ago
From Rob:

If you could import the following mix of Ventana and Doc Ricketts’ dives into 
STOQS, that’d be great:
2014:  V3766, V3767, V3774, D646
2013:  D449, D478, V3736
2011: V3607, V3630, V3646
2009: V3334, V3363, V3417
2007: V2983, V3006, V3079
2005: V2636, V2661, V2715
2003: V2329, V2354, V2421
2001: T257, V1964, V2069
1999: V1575, V1610, V1668
1997: V1236, V1247, V1321

If that’s too many, it’d be fine to load 2014, 2013, 2009, 2005, 2001, 1997 
(n=19 dives). If that’s too m any, let me know.

To support requests like this we need a load program that can read a list of 
dives from the command line or from a text file containing dive names like 
these.

Original comment by MBARIm...@gmail.com on 30 Oct 2014 at 9:10

GoogleCodeExporter commented 9 years ago
Successfully testing the ROVCTDloader.py script locally. Here is the usage note:

(venv-stoqs)[mccann@localhost stoqshg]$ loaders/ROVCTDloader.py --help
usage: ROVCTDloader.py [-h] -d DATABASE --dives [DIVES [DIVES ...]]
                       --campaignName CAMPAIGNNAME
                       [--campaignDescription CAMPAIGNDESCRIPTION]
                       [--qcFlag {0,1,2,3}] [--stride STRIDE]

Load ROVCTD data into a STOQS database

optional arguments:
  -h, --help            show this help message and exit
  -d DATABASE, --database DATABASE
                        Database alias
  --dives [DIVES [DIVES ...]]
                        Space separated list of dives in format <ROV_letter><number>
  --campaignName CAMPAIGNNAME
                        Short name describing this collection of dives
  --campaignDescription CAMPAIGNDESCRIPTION
                        Longer name explaining purpose for having these dives assembeled together
  --qcFlag {0,1,2,3}    Load only data that have flags of this value and above. QC flags: 0=bad, 1=suspect, 2=default, 3=human checked 
  --stride STRIDE       Longer name explaining purpose for having these dives together

Examples:

Initial test dives requested by Rob:
loaders/ROVCTDloader.py --database stoqs_rovctd_t --dives V1236 V1247 V1321 
V1575 V1610 V1668 T257 V1964 V2069  V2329 V2354 V2421 V2636 V2661 V2715 V2983 
V3006 V3079 V3334 V3363 V3417 V3607 V3630 V3646 D449 D478 V3736 V3766 V3767 
V3774 D646

Assumes that a STOQS database has already been set up following steps 4-7 from 
the LOADING file.

If running from cde-package replace ".py" with ".py.cde".

Original comment by MBARIm...@gmail.com on 31 Oct 2014 at 6:29

GoogleCodeExporter commented 9 years ago
Added --bbox option to help remove spurious positions and ran load that created 
internal database: 

http://kraken.shore.mbari.org/canon/stoqs_rovctd_mw97/query/

Original comment by MBARIm...@gmail.com on 1 Nov 2014 at 4:02

GoogleCodeExporter commented 9 years ago
Added --rov, --start, and --end options for loading a range of dives.

As a test all ROV dives in Monterey Bay were loaded into: 
http://kraken.shore.mbari.org/canon/stoqs_rovctd_mb

Also, all dives ROV dives from the Gulf of Mexico area were loaded into 
http://kraken.shore.mbari.org/canon/stoqs_rovctd_goc

The former database took a few days to load and ended up holding 32,310,949 
data values from 5353 dives over a 25 year period. Exploring this dataset with 
the STOQS UI unveiled some issues:

1. Histograms in the Parameter Values section never appear for the whole 
database. They do appear for a time filter of a few years or less.
2. Data visualization tasks (depending on the amount of data to process) 
sometimes take minutes to complete.

These will be tracked as separate Issues if they are important to users.

This database contains over 5,000 Activities - the most of any STOQS database 
so far. The initial JSON summary data response is about 2.5 MB and the 
Parameter Values Histogram data response is also about the same size. These 
data are all plain ASCII text, formatted for reading by the Flot library. One 
way to decrease the size and improve performance would be to use JavaScript 
TypedArrays and XHR2 ArrayBuffers. This would be worth investigating, but as a 
separate Issue.

As the loaders/ROVCTDloader.py script works for a few use cases this Issue is 
marked as Fixed.

Original comment by MBARIm...@gmail.com on 1 Dec 2014 at 9:52