Closed GoogleCodeExporter closed 9 years ago
The problem of the 27 MB of JSON data was caused by cron jobs that simply
wanted to append data to existing Activities. Instead of appending, the
DAPloaders.py code was creating a new Activity each time it was run.
Suggest investigating adding a command line option for scripts that are run to
append data to existing Activities.
-Mike
Original comment by MBARIm...@gmail.com
on 28 Aug 2014 at 6:27
Core to this issue is the overloading of the use of startDatetime and
endDatetime in the loaders. These values are used to uniquely identify
Activities AND to restrict the time domain of data to be loaded into the
Activity. This works fine if the load is just a one-time load (as happens
following a campaign).
If the startDatetime and endDatetime values are incremented with the
periodicity of the load execution for realtime data (as was done last Fall)
then a new Activity is created for each execution. In this case we would like
to use a different parameter to specify the start (and maybe the end) time for
the DATA to be loaded. The Base_Loader class
(https://code.google.com/p/stoqs/source/browse/loaders/DAPloaders.py) has a
dataStartDatetime parameter, but it does not appear to be implemented.
I will begin testing an implementation.
Original comment by MBARIm...@gmail.com
on 8 Sep 2014 at 6:43
This change set:
https://code.google.com/p/stoqs/source/detail?r=ce8f00cd56bfe27b9f402bdca619edcb
300d82b8 implements a --append option for all loaders that extend LoadScript().
It's being tested with an hourly cron job on loading new data from
http://dods.mbari.org/opendap/data/ssdsdata/deployments/m1/201407/OS_M1_20140716
hourly_CMSTV.nc.html.
There is a problem with only missing datavalues being loaded (well, they aren't
being loaded) with InstantPoints being inserted making the loader think that
those data have been loaded. I suspect that this is being caused by this netCDF
file containing multiple grids for the met, ts, and adcp data and that this
code:
# Deliver the data harmonized as rows as an iterator so that they are fed as needed to the database
for pname in data.keys():
logger.info('Delivering rows of data for %s', pname)
l = 0
import pdb
pdb.set_trace()
for depthArray in data[pname]:
k = 0
logger.debug('depthArray = %s', depthArray)
logger.debug('nomDepths = %s', nomDepths)
##raw_input('PAUSED')
values = {}
for dv in depthArray:
values[pname] = float(dv)
values['time'] = times[pname][l]
values['depth'] = depths[pname][k]
values['latitude'] = latitudes[pname]
values['longitude'] = longitudes[pname]
values['timeUnits'] = timeUnits[pname]
try:
values['nomDepth'] = nomDepths[pname][k]
except IndexError:
values['nomDepth'] = nomDepths[pname]
values['nomLat'] = nomLats[pname]
values['nomLon'] = nomLons[pname]
yield values
k = k + 1
in DAPloaders _getTimeSeriesGridType() has some assumptions that didn't
anticipate different grids in the same file.
Original comment by MBARIm...@gmail.com
on 11 Sep 2014 at 4:47
I examined that code. The logic appears to be correct for dealing with multiple
grids in the same file. Need to do more testing...
Original comment by MBARIm...@gmail.com
on 11 Sep 2014 at 10:00
It appears that the missing value is in the source data, e.g. for the last 2
values of air_temperature now:
http://dods.mbari.org/opendap/hyrax/data/ssdsdata/deployments/m1/201407/OS_M1_20
140716hourly_CMSTV.nc.ascii?hr_time_met[1373:1:1374],AIR_TEMPERATURE_HR[1373:1:1
374][0:1:0][0:1:0][0:1:0]
Dataset: OS_M1_20140716hourly_CMSTV.nc
hr_time_met, 1410481800, 1410485400
AIR_TEMPERATURE_HR.Longitude, -122.030275
AIR_TEMPERATURE_HR.AIR_TEMPERATURE_HR[AIR_TEMPERATURE_HR.hr_time_met=1410481800]
[AIR_TEMPERATURE_HR.HR_DEPTH_met=-2.5][AIR_TEMPERATURE_HR.Latitude=36.756775],
15.6817
AIR_TEMPERATURE_HR.AIR_TEMPERATURE_HR[AIR_TEMPERATURE_HR.hr_time_met=1410485400]
[AIR_TEMPERATURE_HR.HR_DEPTH_met=-2.5][AIR_TEMPERATURE_HR.Latitude=36.756775],
-1e+34
When the load script runs hourly it is loading just the last value of this
missing_value. The InstantPoint gets loaded, preventing the good value from
being loaded the next hour. Hmmmm... what to do about this...
Original comment by MBARIm...@gmail.com
on 12 Sep 2014 at 3:21
There are missing_values (or _FillValues) because there may be ADCP or TS data
in that time cell, or vice versa. Let's try changing the dataStartDatetime
value to one hour less than the last InstantPoint timevalue in the database.
This should fill in the good data values when they come in. There will be some
database warnings for attempts to load MeasuredParameters that already exist.
Original comment by MBARIm...@gmail.com
on 12 Sep 2014 at 3:27
That was it! Changing the loadM1() method in loaders/CANON/__init__.py to
subtract an hour from the last time in the database now is loading new data:
if self.args.append:
# Return datetime of last timevalue - if data are loaded from multiple activities return the earliest last datetime value
dataStartDatetime = InstantPoint.objects.using(self.dbAlias).filter(activity__name=aName).aggregate(Max('timevalue'))['timevalue__max']
if dataStartDatetime:
# Subract an hour to fill in missing_values at end from previous load
dataStartDatetime = dataStartDatetime - timedelta(seconds=3600)
You can observe the results (updated hourly) at
http://kraken.shore.mbari.org/canon/stoqs_september2014/ (internal to MBARI).
Now to confirm this works for lrauv data...
Original comment by MBARIm...@gmail.com
on 12 Sep 2014 at 8:47
Still working on reworking the monitorLrauv.py code to work with current data.
Observed a problem with the mooring data load that has been running on kraken
for about a week. It seems that extra records that aren't needed are being
added to the simpledepthtime table. This is increasing the size of the json in
the query/summary response and giving artifacts in the Temporal/Depth Flot
plot.
Need to investigate updating the last time for each depth in the
simpledepthtime table rather than blindly inserting new records...
Original comment by MBARIm...@gmail.com
on 19 Sep 2014 at 4:25
Regarding last comment, changeset
https://code.google.com/p/stoqs/source/detail?r=35b181ca3fa29960c01b60d7e8e99744
00e28984 implements a better update of the SimpleDepthTime table for when
timeSeries and timeSeriesProfile data are being appended. The UI response time
is much better now: it went from about 8 seconds to less than 0.5 seconds.
Original comment by MBARIm...@gmail.com
on 23 Sep 2014 at 4:41
Committed changes to
https://code.google.com/p/stoqs/source/browse/loaders/CANON/realtime/monitorLrau
v.py and it appears to be running fine with the dataStartDatetime being set to
the last time for the Activity in the database.
Marking this issue as fixed.
Original comment by MBARIm...@gmail.com
on 24 Sep 2014 at 4:53
Original issue reported on code.google.com by
MBARIm...@gmail.com
on 5 Nov 2013 at 6:32