Open dkirkby opened 8 years ago
Lets assume that the local data store is complete so we raise a RuntimeError if any expected file cannot be found, and never try to download missing files via the network.
The user access pattern that we want to preserve is:
remote_path = finder.get_spec_path(plate=4567, mdj=55589, fiber=88, lite=True)
local_path = mirror.get(remote_path)
The finder
is configured by $BOSS_SAS_PATH and $BOSS_REDUX_VERSION. With the defaults:
export BOSS_SAS_PATH=/sas/dr12/boss
export BOSS_REDUX_VERSION=v5_7_0
remote_path
is:
/sas/dr12/boss/spectro/redux/v5_7_0/spectra/lite/4567/spec-4567-55589-0088.fits
The mirror
is configured by BOSS_DATA_URL and BOSS_LOCAL_ROOT and first checks if
$BOSS_LOCAL_ROOT/$BOSS_SAS_PATH/$BOSS_REDUX_VERSION/...
exists and, if not, tries to download it from
$BOSS_DATA_URL/$BOSS_SAS_PATH/$BOSS_REDUX_VERSION/...
The new mirror
logic we want is to:
$BOSS_LOCAL_ROOT
.$BOSS_DATA_URL
.RuntimeError
if not found in either place.This should use $BOSS_LOCAL_ROOT
to write and cache sqlite3 files without any modifications to the meta
module. However, meta
currently uses the following pattern to convert local paths returned by the mirror
to sqlite3 paths:
db_path = local_path.replace('.fits', '.db')
This should be somehow delegated to the mirror
instead, which then translates the read-only path under BOSS_DATA_URL
into a read-write path under $BOSS_LOCAL_ROOT
.
First step is to change the 4 direct path manipulations in meta
, e.g.
db_path = local_path.replace('.fits', '.db')
becomes:
db_path = mirror.local_path_replace(local_path, '.fits', '.db')
This still does not quite work since meta
calls mirror.local_path
in several places. Instead, generalize mirror.local_path
so that it can optionally replace the suffix of the returned local path:
db_path = mirror.local_path(remote_path, '.fits', '.db')
This cleans up the meta
logic a bit and eliminates the static _db_path_helper()
.
The current data access model assumes that a user has rw access to $BOSS_LOCAL_ROOT and that files not already cached must be copied to $BOSS_LOCAL_ROOT. This does not work well on sites that already have most of the data directly visible via the file system, where you want to use this data directly without doing any downloads. This issue is to add support for efficiently taking advantage of local data.
The simplest approach would be to set $BOSS_LOCAL_ROOT, $BOSS_SAS_PATH and $BOSS_REDUX_VERSION so that all files appear to already be cached and no downloads are ever attempted. This does not work if any files are missing (perhaps unlikely) or when a metadata file is converted to its equivalent sqlite3 file.