Closed rsignell-usgs closed 6 years ago
https://github.com/IntelAI/experimental-kvc
This might be another good option to look at
@zflamig have you experimented with this yet?
Are you getting enhanced performance relative to FUSE?
@rsignell-usgs Not yet... got distracted with other things unfortunately.
http://pangeo.esipfed.org users now have access to any public-read S3 bucket via /s3/<bucket>
, following the Met Office approach:
jupyter-config.yaml
to include the storage
section as directed and update the pangeo helm chart. AmazonS3ReadOnlyAccess
attached. So now we can see the National Water Model data at:
ls /s3/noaa-nwm-pds/
It turns out my notebook is seeing /s3
, but the dask workers are not.
So min r-k helped me fix this problem. Hurrah for the scipy 2018 code sprint! When he found out the notebook user pod was working, he had me dump the parameters, and then steal the settings from there to populate the custom-worker-template.yaml
, which now looks like this:
metadata:
spec:
restartPolicy: Never
volumes:
- flexVolume:
driver: informaticslab/pysssix-flex-volume
options:
readonly: "true"
name: s3
containers:
- args:
- dask-worker
- --nthreads
- '2'
- --no-bokeh
- --memory-limit
- 6GB
- --death-timeout
- '60'
image: esip/pangeo-notebook:2018-07-04
name: dask-worker
securityContext:
capabilities:
add: [SYS_ADMIN]
privileged: true
volumeMounts:
- mountPath: /s3
name: s3
resources:
limits:
cpu: "1.75"
memory: 6G
requests:
cpu: "1.75"
memory: 6G
We found out what the notebook pod was using by first doing:
kubectl get pods -n esip-dev | grep jupyter
to find my user pod, and then ran this command to dump the info to json:
kubectl get pod -o yaml -n esip-dev jupyter-rsignell-2dusgs > foo.yaml
I just heard on a pangeo web meeting that the Met Office developed a FUSE toolbox that you can use to mount all your s3 content as file system: https://github.com/informatics-lab/s3-fuse-flex-volume/blob/master/README.md
We should enable this so we can compare this baseline to other approaches like zarr and HSDS.