Closed adswa closed 2 years ago
Hi - thanks for trying out Studio Lab! We'll look into this and get back to you shortly.
Wow, I haven't worked with DataLad before, and it looks really nice. Could you send me a link of the lines you're trying to run start to finish, maybe even in a notebook, so I can replicate the issue and root cause it? Thanks.
Sure, and thanks a lot for looking into this! Here is a short script from a notebook with a Python kernel, including installation:
# installation and set up
conda install -c conda-forge datalad
!git config --global --add user.name "Adina Wagner"
!git config --global --add user.email "adina.wagner@t-online.de"
# import
import datalad.api as dl
# clone a "superdataset" with many datasets underneath
dl.clone('https://github.com/dandi/dandisets')
ds = dl.Dataset('/home/studio-lab-user/sagemaker-studiolab-notebooks/dandisets')
# install a single dataset from the collection but without getting data, just to browse its files
ds.get(path='000003', get_data=False)
!ls 'dandisets/000003'
# get a directory (3 files)
ds.get('000003/sub-YutaMouse20')
You could do the same thing in a terminal, and get more debug output by using the git-annex calls directly:
(installation would be the same as above if not done already)
datalad clone https://github.com/dandi/dandisets.git
cd dandisets && datalad get -n 000003
ls 000003
cd 000003
git annex -dbg get sub-YutaMouse20
Sorry, I accidentally just took an example with unnecessarily big data (the three files are each 10GB in size). A smaller example dataset would be https://github.com/datalad-datasets/machinelearning-books.
datalad clone https://github.com/datalad-datasets/machinelearning-books.git
cd machine-learning-books
datalad get A.Shashua-Introduction_to_Machine_Learning.pdf
import datalad.api as dl
dl.clone('https://github.com/datalad-datasets/machinelearning-books.git')
ds = dl.Dataset('machinelearning-books')
ds.get('A.Shashua-Introduction_to_Machine_Learning.pdf')
Confirmed! I'm getting the same error - please standby.
Closing this for now - I've created a ticket with the team. Are you still blocked on this?
Closing this for now - I've created a ticket with the team. Are you still blocked on this?
Sadly, yes, but thanks for creating a ticket.
Apologies in advance if this repository is not the right place for a request like this, and many thanks for Sagemaker Studio Lab! I usually retrieve data via git-annex, which is a very convenient way to install datasets and retrieve portions of it on demand. It allows me to install huge datasets, often many TB large, often directly by cloning a GitHub repository, but only retrieve individual files or drop data that I have processed already. I use it as part of the datalad package, which allows me to do the data retrieval in a python session as part of my scripts.
Basic file retrieval with git-annex fails so far:
The cause of the failure lies in
ConnectionFailure Network.BSD.getProtocolByName: does not exist (no such protocol name: tcp)
; I believe this is because netbase isn't installed and/etc/protocols
thus doesn't exist.Is there a way to have it installed, or a solution I have missed in the documentation so far? Thanks in advance!