flatironinstitute / spikeforest_recordings

Ephys recordings for the SpikeForest project
3 stars 4 forks source link

Unable to download raw files and ground truth data using kachery #5

Closed mckarthik7 closed 3 years ago

mckarthik7 commented 3 years ago

Hi,

I followed the steps listed in download-spikeforest-data.md. I installed kachery, signed up for an account and added the spikeforest channel. However, I was unable to download any data other than the example in that document, i.e. the PAIRED_KAMPF recording.

Trying to download anything else, for example, here's the sha1 listed in SYNTH_BIONET/synth_bionet_static/static_8x_A_2A ---

{
    "raw": "sha1dir://abc900f5cd62436e7c89d914c9f36dcd7fcca0e7.synth_bionet/bionet_static/static_8x_A_2A/raw.mda",
    "params": {
        "samplerate": 20000,
        "spike_sign": -1,
        "scale_factor": 0.1
    },
...

Running $ kachery-load sha1dir://abc900f5cd62436e7c89d914c9f36dcd7fcca0e7.synth_bionet/bionet_static/static_8x_A_2A/raw.mda yields ---

Unable to load file

The only command that succeeds for this specific directory is $: kachery-ls sha1dir://abc900f5cd62436e7c89d914c9f36dcd7fcca0e7.synth_bionet/ which yields

bionet_drift/
bionet_shuffle/
bionet_static/

I have tried the same using --remote-only. I have looked into the python script invoked by kachery-load, it appears the url returned for any raw/ground truth file is empty at line 215: url0, algorithm, hash0, size0 = _check_remote_file(path, config=config), causing the load_file function to return None.

Of course, I tried this with other mda files present in the repo but none seem to be working other than the paired_kampf one. Am I missing any instruction? Do I need to subscribe to more channels?

magland commented 3 years ago

Hi, are you sure you're getting the recordings from the right place as given in the instructions? For example: https://github.com/flatironinstitute/spikeforest_recordings/blob/master/recordings/SYNTH_BIONET/synth_bionet_static/static_8x_A_2A.json

In general, if you see sha1dir:// rather than sha1:// then it is an old URI and will not load successfully.

mckarthik7 commented 3 years ago

Thank you for the response. I see the issue. The JSON file referred to in prepare_recordings.py hasn't been updated while the files under recordings/ have been. I was able to download all the raw.mda files. However, I am still having trouble downloading these files.

In the file you linked, the sha uri was "raw": "sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda?manifest=1605320e84d76e728d06a28c9fd7477bd58e4c20. Downloading this gives the following error ---

kachery-load 'sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda?manifest=1605320e84d76e728d06a28c9fd7477bd58e4c20' --dest 'static_8x_A_2A.raw.mda'
Loaded 424 of 2304000020 bytes (0.0 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda?manifest=1605320e84d76e728d06a28c9fd7477bd58e4c20
None

I wasn't sure what was incorrect but I tried again after removing the manifest to see if that makes any difference. Interestingly it tried to download more data this time ---

kachery-load 'sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda' --dest 'static_8x_A_2A.raw.mda'                                                                            
Loaded 218185728 of 2304000020 bytes (9.5 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
Loaded 442974208 of 2304000020 bytes (19.2 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
Loaded 649363456 of 2304000020 bytes (28.2 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
Loaded 873168896 of 2304000020 bytes (37.9 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
Loaded 1098087926 of 2304000020 bytes (47.7 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
Loaded 1306590710 of 2304000020 bytes (56.7 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
Loaded 1529658870 of 2304000020 bytes (66.4 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
Loaded 1743454208 of 2304000020 bytes (75.7 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
Loaded 1961459712 of 2304000020 bytes (85.1 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
Loaded 2185740288 of 2304000020 bytes (94.9 %): sha1://e72dbde9141f7ff543f1f388273f0491b9e98f3e/raw.mda
None

In addition, the firings_true.json files seem strange; all ground truth data in synth_bionet point to the same file. Eg static_8x_A_2A.firings_true.json and static_8x_A_2B.firings_true.json both have the following sha uri: "firings": "sha1://761adb31f33d758cef44b846ed25030422e84121/firings_true.mda?manifest=3a9bad993d7769ad8a615e3cf0735bdc65cf8ab0 . Is this correct?

mckarthik7 commented 3 years ago

Okay, I seem to have isolated the issue to when I use an NTFS drive for the sha1 folder inside ~/.kachery-storage. I had used a soft link to create that folder; maybe kachery-client expects some specific path?

I am now able to download the raw files and the ground truth files properly. I am still unsure why the ground_truth files for all recordings in SYNTH_BIONET/synth_bionet_static are the same.

magland commented 3 years ago

Glad that's working. I think the ground truth must be the same for all of those simulated recordings -- same firing times and unit labels - just different waveforms and background signal.