calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
410 stars 126 forks source link

how to download the cross2020 model and data? #133

Open lancezhangsf opened 2 years ago

lancezhangsf commented 2 years ago

how to download the cross2020 model and data?

lancezhangsf commented 2 years ago

/basenji/manuscripts/cross2020$ ./get_models.sh --2022-08-10 16:59:39-- https://storage.googleapis.com/basenji_barnyard/model_human.h5 Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.43.16, 172.217.160.80, 142.251.42.240, ... Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.43.16|:443... connected. HTTP request sent, awaiting response... 400 Bad Request 2022-08-10 16:59:45 ERROR 400: Bad Request.

--2022-08-10 16:59:45-- https://storage.googleapis.com/basenji_barnyard/model_mouse.h5 Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.160.112, 142.251.43.16, 172.217.160.80, ... Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.160.112|:443... connected. HTTP request sent, awaiting response... 400 Bad Request 2022-08-10 16:59:51 ERROR 400: Bad Request.

(base) kaldi@kaldi-Super-Server:/data/zsf/WorkSpace/BASENJI/basenji/manuscripts/cross2020$

sheetalgiri commented 2 years ago

not sure if this is a temporary or permanent change, but the cloud bucket now has 'request pays' enabled which means you have to pay google cloud credits to access the data https://cloud.google.com/storage/docs/requester-pays

davek44 commented 2 years ago

We had to switch the training data to requester pays because the cost of offering it was becoming far too large. I'll move the models somewhere free because they're smaller, but I'm having trouble with that right now. Give me a day or two to figure it out.

davek44 commented 2 years ago

OK you can now grab the models and other small files from gs://basenji_barnyard2/

xxjxuejian commented 2 years ago

when i click this link : https://console.cloud.google.com/storage/browser/basenji_barnyard2

there is a warning : Additional permissions required to list objects in this bucket. Ask a bucket owner to grant you 'storage.objects.list' permission.

Is there a problem with my method?

davek44 commented 2 years ago

I forgot to make it public. Sorry about that. Try again now

xxjxuejian commented 2 years ago

Thank you so much for gs://basenji_barnyard2/. But, I want to use the data to run the model, but your gs://basenji_barnyard2/ does not provide a .tfr file, I tried to use basenji_data.py and other files to generate the dataset, but it failed, I don't know where the problem is. I just want to do some tests with the dataset, can you give me a tfr file, just need one,a piece of data in the training set is fine. Thank you! @davek44

davek44 commented 2 years ago

OK, I added a single tfrecord file to the public bucket gs://basenji_barnyard2/demo_tfr/train-0-0.tfr. If you need the entire dataset, just set up an account with payment and download from gs://basenji_barnyard/data/

xxjxuejian commented 2 years ago

Thank you very much!

Dnelnaker commented 9 months ago

Hello, I need help in this issue??

I was trying to train enformer model and tried to access basenji data gs://basenji_barnyard/data/ and while running this code, human = get_dataset('human', 'train').batch(1).repeat() mouse_dataset = get_dataset('mouse', 'train').batch(1).repeat() human_mouse_dataset = tf.data.Dataset.zip((human_dataset, mouse_dataset)).prefetch(2) and I got this error---InvalidArgumentError Traceback (most recent call last) in <cell line: 1>() ----> 1 human = get_dataset('human', 'train').batch(1).repeat() 2 mouse_dataset = get_dataset('mouse', 'train').batch(1).repeat() 3 human_mouse_dataset = tf.data.Dataset.zip((human_dataset, mouse_dataset)).prefetch(2)

6 frames /usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py in stat_v2(path) 922 errors.OpError: If the operation fails. 923 """ --> 924 return _pywrap_file_io.Stat(compat.path_to_str(path)) 925 926

InvalidArgumentError: Error executing an HTTP request: HTTP response code 400 with body '{ "error": { "code": 400, "message": "Bucket is a requester pays bucket but no user project provided.", "errors": [ { "message": "Bucket is a requester pays bucket but no user project provided.", "domain": "global", "reason": "required" } ] } } ' when reading metadata of gs://basenji_barnyard/data/human/statistics.json

davek44 commented 9 months ago

We had to switch the training data to requester pays because the cost of offering it was becoming far too large. You'll need to setup a payment method for your Google Cloud account. The cost should be very low relative to your other research costs.