flatironinstitute / neuropixels-data-sep-2020

Example neuropixels datasets for purposes of developing spike sorting algorithms
Apache License 2.0
8 stars 5 forks source link

Can not download the data #10

Closed yger closed 4 years ago

yger commented 4 years ago

The download_all.py script do not work, with the error

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=20431): Max retries exceeded with url: /loadFile (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb93ef22f10>: Failed to establish a new connection: [Errno 111] Connection refused'))

Same for the example script

magland commented 4 years ago

Are you running the kachery-p2p daemon?

Also, I would focus on the demo script first as it allows a much more fine-grained selection of which data to download. And it is loaded in a way that the format is unambiguous, including the electrode layout, etc.

yger commented 4 years ago

Ok, I forgot the kachery-p2p daemon, now installed and running fine (at last, it seems fine). However, when trying to execute the code snippet on the github, I still have the following error ` LoadFileError Traceback (most recent call last)

in 7 # Note: if the files are not already on your, then you need 8 # to run a kachery-p2p daemon on the flatiron1 channel. ----> 9 recording = nd.load_recording(recording_id) 10 11 # recording is a SpikeInterface recording extractor ~/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/recordings.py in load_recording(rec_id) 54 or rec_json == entry['recordingId'] or rec_json == entry['recordingLabel']): 55 uri = entry['recordingPath'] ---> 56 recording = LabboxEphysRecordingExtractor(uri, download=False) 57 return recording 58 raise Exception(f"Requested recording with identifier '{rec_id}' is not recognized.") ~/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/extractors/labboxephysrecordingextractor.py in __init__(self, arg, download) 206 def __init__(self, arg: Union[str, dict], download: bool=False): 207 super().__init__() --> 208 obj = _create_object_for_arg(arg) 209 assert obj is not None 210 self._object: dict = obj ~/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/extractors/labboxephysrecordingextractor.py in _create_object_for_arg(arg) 127 if (isinstance(arg, str)) and (arg.endswith('.json')): 128 path = arg --> 129 x = kp.load_object(path) 130 if x is None: 131 raise Exception(f'Unable to load object: {path}') ~/.local/lib/python3.8/site-packages/kachery_p2p/core.py in load_object(uri, p2p, from_node, from_channel) 166 167 def load_object(uri: str, p2p: bool=True, from_node: Union[str, None]=None, from_channel: Union[str, None]=None): --> 168 local_path = load_file(uri, p2p=p2p, from_node=from_node, from_channel=from_channel) 169 if local_path is None: 170 return None ~/.local/lib/python3.8/site-packages/kachery_p2p/core.py in load_file(uri, dest, p2p, from_node, from_channel) 114 print(f'Loaded {bytes_loaded} of {bytes_total} bytes{nodestr} ({pct:.1f} %): {uri}') 115 elif type0 == 'error': --> 116 raise LoadFileError(f'Error loading file: {r["error"]}') 117 else: 118 raise Exception(f'Unexpected message from daemon: {r}') LoadFileError: Error loading file: File found, but no providers passed test load. `
yger commented 4 years ago

And the error for the script download_all.py is different

Creating (1 of 7): recordings/cortexlab-single-phase-3.dat /usr/lib/python3/dist-packages/apport/report.py:13: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import fnmatch, glob, traceback, errno, sys, atexit, locale, imp Traceback (most recent call last): File "scripts/download_all.py", line 22, in recording = nd.load_recording(recording_id) File "/home/pierre/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/recordings.py", line 56, in load_recording recording = LabboxEphysRecordingExtractor(uri, download=False) File "/home/pierre/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/extractors/labboxephysrecordingextractor.py", line 208, in init obj = _create_object_for_arg(arg) File "/home/pierre/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/extractors/labboxephysrecordingextractor.py", line 129, in _create_object_for_arg x = kp.load_object(path) File "/home/pierre/.local/lib/python3.8/site-packages/kachery_p2p/core.py", line 168, in load_object local_path = load_file(uri, p2p=p2p, from_node=from_node, from_channel=from_channel) File "/home/pierre/.local/lib/python3.8/site-packages/kachery_p2p/core.py", line 116, in load_file raise LoadFileError(f'Error loading file: {r["error"]}') kachery_p2p.exceptions.LoadFileError: Error loading file: File not found.

magland commented 4 years ago

Could you please try the sample script again? I think something timed out.

yger commented 4 years ago

Error is now

Exception: Unable to load object: sha1://4184f3c842745e9e48b4e5aeb0a2019912837e5f/cortexlab-single-phase-3-ch0-7-10sec.json

magland commented 4 years ago

Hmmm. I suspect the files are not being loaded because something keeps timing out. (It is working on my machine, so I can't reproduce). Maybe it relates to geographic distance to the data.

If you want you can keep trying to run the script and I suspect things will load after a couple tries.

But if you wait a few hours we will release a new version that has more careful timeouts set.

Tagging: @jsoules

yger commented 4 years ago

Ok, I'll keep retrying this afternoon, and see how it goes. I used kachery-p2p from pip install, and not from master branch. Could it be the reason?

yger commented 4 years ago

And for info, this is the log of kachery

PROTOCOL VERSION: kachery-p2p-0.4.18

CHANNEL: flatiron1 (7 nodes) self 9748db Node de05da... li1075-33: 45.33.92.33:45002 bootstrap out udp-in udp-out Node d94fda... labbox-ephys-deployment-7c989c78c6-m6k9x: ephys1.laboratorybox.org:14107 out Node da13c4... dubb: : udp-in udp-out Node 9beb5d... ccmlin008.flatironinstitute.org: :4008 udp-in udp-out Node f64510... tegula: : udp-in udp-out Node* abf4d3... DBC-Linux: : udp-in udp-out

OTHER self 9748db Node f10a45... li1075-31: 45.33.92.31:45002 bootstrap out udp-in udp-out

magland commented 4 years ago

The pypi version should be good. Thanks for testing!

On Mon, Aug 24, 2020 at 8:25 AM Pierre Yger notifications@github.com wrote:

Ok, I'll keep retrying this afternoon, and see how it goes. I used kachery-p2p from pip install, and not from master branch. Could it be the reason?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/neuropixels-data-sep-2020/issues/10#issuecomment-679095161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4CIQEJS4CFTRWTFQXYMK3SCJL2JANCNFSM4QJHJRRQ .

yger commented 4 years ago

No problem, looking forward to try it out. I also forgot a portion of the error in fact

Loaded file: sha1://4184f3c842745e9e48b4e5aeb0a2019912837e5f/cortexlab-single-phase-3-ch0-7-10sec.json /usr/lib/python3/dist-packages/traitlets/config/configurable.py:143: ResourceWarning: unclosed <socket.socket fd=33, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 39868), raddr=('127.0.0.1', 20431)> for name, config_value in my_config.items(): ResourceWarning: Enable tracemalloc to get the object allocation traceback

magland commented 4 years ago

Thanks. I've seen that particular warning before (it seems to be benign).

I've increased the default timeouts and pushed another version to pypi. So please try pip install --upgrade kachery-p2p and you should get version 0.4.25 (via pip show). Then restart the daemon. Don't worry that the console still shows protocol version 0.4.18, that's expected.

Then I'm guessing that the load will work. (Fingers crossed)

yger commented 4 years ago

Sadly, I just upgraded kachery, and even with version 0.4.25 (and nodejs 12.18.3), the errors are the same :-(

magland commented 4 years ago

Okay, thanks so much for testing. I am going to confer with @jsoules, and we will come up with a more robust solution for this. Please hang tight and expect something by tomorrow.

jsoules commented 4 years ago

Thanks for reporting this issue. To make sure I'm on the same page, would you mind pasting the code/command you were trying to execute?

Also, could I ask you to try again and time how long it takes for you to get the error? Just something simple like:

from time import perf_counter
start = perf_counter()
try:
    YOUR CODE HERE
except Exception as error:
    print(error)
finally:
    print(f"Elapsed time: {perf_counter() - start}")

for Python or

$ time SCRIPT.sh

for shell.

Thanks!

yger commented 4 years ago

This is my output for the attached script (uploaded as text file) test.txt

Loaded file: sha1://4184f3c842745e9e48b4e5aeb0a2019912837e5f/cortexlab-single-phase-3-ch0-7-10sec.json Unable to load object: sha1://4184f3c842745e9e48b4e5aeb0a2019912837e5f/cortexlab-single-phase-3-ch0-7-10sec.json Elapsed time: 0.0064486739993299125 /home/pierre/.local/lib/python3.8/site-packages/matplotlib-3.2.1-py3.8-linux-x86_64.egg/matplotlib/_pylab_helpers.py:76: ResourceWarning: unclosed <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 42288), raddr=('127.0.0.1', 20431)> gc.collect(1) ResourceWarning: Enable tracemalloc to get the object allocation traceback

magland commented 4 years ago

Ah, that's helpful. I think it's not a download problem here, but the file itself doesn't exist. Will remedy and respond soon.

magland commented 4 years ago

@yger, please try again. It will probably work as is (now the file exists). But you should also git pull on this repo because there was an adjustment to one of the other files as well.

In addition we are working on making the download more fail-proof by providing a fall-back static file server in case the p2p network does not come through.

yger commented 4 years ago

I updated the repo, and relaunch the same script. But I still have nothing displayed and now the error reads

Loaded file: sha1://08295c224a0753457cde13e78a97cdf8465bd214/known_recordings.json 'NoneType' object is not subscriptable Elapsed time: 0.0138720929990086

magland commented 4 years ago

Could you please put a raise statement after the print(error) so we can see the full error trace?

...
except Exception as error:
    print(error)
    raise error
...
yger commented 4 years ago

Sure, sorry for that

Traceback (most recent call last): File "test.py", line 40, in raise error File "test.py", line 13, in recording = nd.load_recording(recording_id) File "/home/pierre/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/recordings.py", line 30, in load_recording valid_recordings: List[Any] = get_valid_recordings() File "/home/pierre/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/recordings.py", line 23, in get_valid_recordings return x['recordings'] TypeError: 'NoneType' object is not subscriptable

yger commented 4 years ago

Actually, if I delete the kachery folder, to clean everything, the error is still there, but at least we can see that something is loaded

Loaded 137007 of 137007 bytes from 9beb5d (100.0 %): sha1://08295c224a0753457cde13e78a97cdf8465bd214/known_recordings.json Loaded file: sha1://08295c224a0753457cde13e78a97cdf8465bd214/known_recordings.json 'NoneType' object is not subscriptable Elapsed time: 12.740095626999391 /usr/lib/python3/dist-packages/apport/report.py:13: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import fnmatch, glob, traceback, errno, sys, atexit, locale, imp Traceback (most recent call last): File "test.py", line 40, in raise error File "test.py", line 13, in recording = nd.load_recording(recording_id) File "/home/pierre/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/recordings.py", line 30, in load_recording valid_recordings: List[Any] = get_valid_recordings() File "/home/pierre/github/neuropixels-data-sep-2020/neuropixels_data_sep_2020/recordings.py", line 23, in get_valid_recordings return x['recordings'] TypeError: 'NoneType' object is not subscriptable

magland commented 4 years ago

This is strange because I am not having that problem on my end. Could you please try this from the command-line

kachery-p2p-cat sha1://08295c224a0753457cde13e78a97cdf8465bd214/known_recordings.json

You should see the content of that file. Is it empty?

If not empty, then you can find the location on your system:

kachery-load sha1://08295c224a0753457cde13e78a97cdf8465bd214/known_recordings.json

Then check the sha1sum of that file (make sure not corrupted somehow)

sha1sum ...

Perhaps @jsoules can test as well.

magland commented 4 years ago

Oh, I think I may know what the problem is. Do you have the KACHERY_STORAGE_DIR environment set in both terminal running the daemon, and the terminal running the script? (and they need to be set to the same path)

yger commented 4 years ago

Yes, sorry, my fault, it was the env variable not correctly set to the same value in the two terminals... Now it seems that my script is loading something!

magland commented 4 years ago

Glad that's working and thanks for your patience. We are adding some checks to make things less prone