flatironinstitute / kachery-cloud

Apache License 2.0
20 stars 4 forks source link

kachery-cloud

:warning: This project is in BETA.

IMPORTANT: This package is intended for collaborative sharing of data for scientific research. It should not be used for other purposes.

PLEASE NOTE: At this point, uploaded files are not guaranteed to be available forever.

Kachery-cloud is designed to make it easier for researchers to share data and files between different computers and with web applications. It provides a content-addressable storage network and a uniform way to access data and files across different machines. The goal is to make scientific research more reproducible and collaborative.

Kachery-cloud is a core part of figurl.

Contents

Kachery-cloud is a network designed for scientists to share files between lab computers and between workstations and web browsers. Access to the network is granted through registered Python clients and web applications that store and access files and data objects. Kachery URIs are essentially content hashes, and in this way, Kachery forms a content-addressable storage database. While a primary purpose of kachery-cloud is to support figurl in the browser, it can also be incorporated into scientific workflows in other ways to enhance reproducibility and dissemination.

Installation and setup

Kachery-cloud is often installed as a dependency of other projects, but there are times when you may want to use it stand-alone.

It is best to use a conda environment or a virtual environment.

Requirements

pip install kachery-cloud

# or for the development version, clone this repo and install via "pip install -e ."

To complete the setup, open a terminal and run

# One-time initialization
kachery-cloud-init

# Follow the instructions to associate your computer with your GitHub user on the kachery-cloud network

Clicking the link will bring you to a page where you associate this account with a GitHub user ID for the purpose of managing projects and tracking usage. This initialization only needs to be performed once on your computer. The client information will be stored in ~/.kachery-cloud.

If you are using a colab or jupyter notebook and do not have easy access to a terminal, you can also run this one-time step in the notebook:

# One-time initialization (alternate method)
import kachery_cloud as kcl
kcl.init()

# Follow the instructions to associate the client with your GitHub user on the kachery-cloud network

Basic usage

Environment variables

You can use an environment variables to control the storage/configuration directory used by kachery-cloud.

# Set the storage/configuration directory used by kachery-cloud
# If unset, $HOME/.kachery-cloud will be used
# The client ID will be determined by this directory
# You can share the same kachery-cloud directory between multiple users,
# but you will need to set mult-user mode for the client
export KACHERY_CLOUD_DIR="..."

# Set the KACHERY_ZONE environment variable to control
# which directory files are upload to and retrieved from.
# If unset, the default zone is used.
export KACHERY_ZONE="..."

It is recommend that you set these variables in your ~/.bashrc file.

Creating your own Kachery zone

Creating your own Kachery zone

Administering your Kachery zone

Hosting a Kachery resource

Hosting a Kachery resource

Hosting a Kachery gateway

Hosting a Kachery gateway

Sharing the kachery cloud directory between multiple users

Share the kachery cloud directory between multiple users

Frequently asked questions

Frequently asked questions

Authors

Jeremy Magland and Jeff Soules, Center for Computational Mathematics, Flatiron Institute

License

Kachery-cloud is an open source project and released under the Apache 2.0 license.