activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
https://activeloop.ai
Mozilla Public License 2.0
7.88k stars 607 forks source link

[BUG] Datasets not accessible in Google Colab #2818

Closed cjharrington85 closed 3 months ago

cjharrington85 commented 3 months ago

Severity

P0 - Critical breaking issue or missing functionality

Current Behavior

I was able to use the Deeplake Spoken MNIST dataset (https://datasets.activeloop.ai/docs/ml/datasets/free-spoken-digit-dataset-fsdd/) in Colab up until last week. For some reason, the service is being blocked when I try to use ds=deeplake.load("hub://activeloop/spoken_mnist"). It works fine when I do the same within a python environment on my laptop.

Here's a copy of my project: https://colab.research.google.com/drive/1qWGufDVgs9OlkB9rAcUzIlV6MupWPsQA?usp=sharing. Please let me know if you require access to it.

Steps to Reproduce

  1. Import a dataset, such as Spoken MNIST, using deeplake.load("hub://activeloop/spoken_mnist") witihin a Google Colab notebook..
  2. There will be an error message saying the server is not accessible.

Expected/Desired Behavior

The dataset should import with no issue. This is the case when I try to do so locally on my personal machine.

Python Version

3.10.2

OS

Ubuntu 22.04 LTS

IDE

Google Colab

Packages

deeplake==3.8.27

Additional Context

No response

Possible Solution

No response

Are you willing to submit a PR?

davidbuniat commented 3 months ago

@cjharrington85 thanks for raising the issue and apologize for inconvenience. Can you give access to the collab?

cjharrington85 commented 3 months ago

HI @davidbuniat, Thanks for your quick reply. I just granted access.

istranical commented 3 months ago

Hi @cjharrington85 There's two likely explanations here.

There'a an issue in Colab that is preventing it from connecting to our storage provider. We've submitted a ticket to Colab and are waiting for a resolution. In order to fix the issue, can you please run the code below every time you start the environment:

with open('/etc/resolv.conf', 'w') as file:
   file.write("nameserver 8.8.8.8")

Another issue is that there was a temporary outage in our storage about 4 hrs ago, so I think the issue above was the root cause in your case.

cjharrington85 commented 3 months ago

Hi @istranical,

Thanks, looks like this workaround solved the issue for me. I'll continue to use it until there's a permanent fix.