datajoint / datajoint-python

Relational data pipelines for the science lab
https://datajoint.com/docs
GNU Lesser General Public License v2.1
169 stars 84 forks source link

External storage support for Google Cloud Storage #750

Open stephenholtz opened 4 years ago

stephenholtz commented 4 years ago

Currently s3 allows manipulation of files in Amazon S3 stores, and the ExternalTable class has conditionals for local versus AWS locations. Something similar for Google Cloud Storage ought to be possible, in particular because Google also offers a reasonable Python 3 API.

I'm not sure if any work for https://github.com/datajoint/datajoint-python/issues/439 will change how this is handled currently.

dimitri-yatsenko commented 4 years ago

hi @stephenholtz The ExternalTable class and the storage functionality are designed for extensibility to other storage systems, including GCS. Adding GCS support would require only a small amount of development. #439 is currently addressed with tools specific to Globus and perhaps does not even need explicit DataJoint support.

dimitri-yatsenko commented 4 years ago

This issue will track the implementation of GSC support for external storage.

stephenholtz commented 4 years ago

@dimitri-yatsenko great! If I'm the only one working with google cloud buckets then I'll happily work on this -- starting next week I'll have some time to commit, so to speak.

dimitri-yatsenko commented 4 years ago

Hi @stephenholtz Awesome! @chrisroat in Karl Deisseroth's lab was also looking into GSC support. He may have made progress.

chrisroat commented 4 years ago

I've been heavy into the underlying algorithms, as we changed our pipelines around somewhat. I haven't worked on this, sorry.

stephenholtz commented 4 years ago

(sorry for the close/open)

I am currently using gcsfuse and a script to ensure mounting happens properly to get most of the functionality I wanted out of this system, but I remain interested in adding these features.

I am not sure how any additions I make would mesh with plans for adding an external storage plugin interface https://github.com/datajoint/datajoint-python/issues/762 and don't want to add work for you all restructuring whatever I come up with. Thoughts?

stephenholtz commented 4 years ago

Waiting to see the shape of a plugin architecture is my current preference, but if it would be valuable to anyone except for me please let me know.