RhodiumGroup / rhg_compute_tools

Tools for using compute.rhg.com and compute.impactlab.org
MIT License
1 stars 4 forks source link

Add a "make gcs fuse directories in place" function #63

Open delgadom opened 4 years ago

delgadom commented 4 years ago
def add_fuse_directory_markers_to_cloud_storage(client, bucket_name, root_path="", pbar=True):
    """
    Create gcsfuse directory markers from a bucket and root path

    Parameters
    -----------
    client : google.cloud.storage.Client
        See the [google.cloud.storage.Client](https://googleapis.dev/python/storage/latest/client.html) docs for help setting this up.
    bucket_name : str
        name of the bucket on gcs
    root_path : str, optional
        prefix of "directories" below which to create the directory markers

    Examples
    ---------

    The following will create directory markers for all directories within gs://my-bucket/path/to/root,
    where directories are indicated by the presence of blobs with directory separators (`'/'`) in the
    path. Empty directories will not be created, since these cannot exist on google cloud storage.

    .. code-block:: python

        >>> client = google.cloud.storage.Client.from_service_account_json('/path/to/cred.json')
        >>> add_fuse_directory_markers_to_cloud_storage(client, 'my-bucket', 'path/to/root/')

    """
    blobs = bucket.list_blobs(prefix=root_path)
    pages = blobs.pages
    if pbar:
        progress_bar = tqdm(pages)
        total_items = 0

    directories = set()

    for page in pages:
        if pbar:
            total_items += page.num_items
            progress_bar.total = total_items
            progress_bar.refresh()

        for blob in page:
            if pbar:
                progress_bar.update()

            dirname = os.path.dirname(blob.name).rstrip("\\/") + "/"

            if dirname not in directories:
                dir_blob = bucket.blob(dirname)
                if not dir_blob.exists():
                    dir_blob.upload_from_string(b"")

            directories.add(dirname)

    if pbar:
        progress_bar.close()
delgadom commented 4 years ago

or use this: https://gist.github.com/brews/0e7c90d57ead9cea608581c89606c2c8