activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
https://activeloop.ai
Mozilla Public License 2.0
8.08k stars 616 forks source link

GCS Token Credentials support #2896

Closed dgaloop closed 2 months ago

dgaloop commented 3 months ago

πŸš€ πŸš€ Pull Request

Impact

Description

The PR aims to adapt deeplake to support Downscroped GCS Credentials, both for Federated and Service JSON (master/permanent) credentials. Additionally, it implements generation of Presigned Urls for streaming of blobs, for the case of linked tensors, with data in GCS attached via corresponding ManagedCredentials

Things to be aware of

  1. Backend is going to downscrope the GCS Service account credentials is possible, which will lead to usage of gcs_oauth_token key, however it will return the whole JSON as previously. Therefore we should still ensure backwards compatilibility in subsequent releases. Additionally, GCS being LinkedCredentials for linked credentials will also avoid downscoping leading to previous response structure.

Things to worry about

Additional Context

Linked Videos in GCS are the only case where the client side, and whatever credentials it gets from backend are not enough to generate presigned urls for blobs. Therefore, a third usecase for GCSStorageProvider emerges - that is generation of presigned url for a given blob, given a ManagedCredentials key.

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 60.81081% with 29 lines in your changes missing coverage. Please review. Files Patch % Lines
deeplake/core/storage/gcs.py 60.71% 22 Missing :warning:
deeplake/client/client.py 20.00% 4 Missing :warning:
deeplake/core/link_creds.py 50.00% 3 Missing :warning:

:loudspeaker: Thoughts on this report? Let us know!

sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
60.8% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud