dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
19 stars 24 forks source link

Metadata / validation not caught before attempting to upload #1270

Open jeromelecoq opened 1 year ago

jeromelecoq commented 1 year ago

I am getting a sequence of error when some metadata is missing :

(nwb) jerome.lecoq@OSXLTCYGQCV upload % nwbinspector ./to_upload --config dandi


NWBInspector Report Summary

Timestamp: 2023-04-05 13:50:51.651946-07:00 Platform: macOS-12.6.3-arm64-arm-64bit NWBInspector version: 0.4.26

Found 17 issues over 1 files: 2 - BEST_PRACTICE_VIOLATION 15 - BEST_PRACTICE_SUGGESTION


0 BEST_PRACTICE_VIOLATION

0.0 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_index_series_points_to_image - 'IndexSeries' object at location '/stimulus/presentation/natural_movie_three_stimulus' Message: Pointing an IndexSeries to a TimeSeries will be deprecated. Please point to an Images container instead.

0.1 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_index_series_points_to_image - 'IndexSeries' object at location '/stimulus/presentation/natural_movie_one_stimulus' Message: Pointing an IndexSeries to a TimeSeries will be deprecated. Please point to an Images container instead.

1 BEST_PRACTICE_SUGGESTION

1.2 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/eye-tracking camera' Message: Description is missing.

1.3 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/display monitor' Message: Description is missing.

1.4 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/Microscope' Message: Description is missing.

1.5 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/2-photon microscope' Message: Description is missing.

1.6 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Images' object with name 'SegmentationImages' Message: Description ('no description') is a placeholder.

1.7 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'GrayscaleImage' object with name 'mean' Message: Description is missing.

1.8 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'GrayscaleImage' object with name 'correlation' Message: Description is missing.

1.9 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_small_dataset_compression - 'OpticalSeries' object at location '/stimulus/templates/natural_movie_three_image_stack' Message: data is not compressed. Consider enabling compression when writing a dataset.

1.10 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_small_dataset_compression - 'OpticalSeries' object at location '/stimulus/templates/natural_movie_one_image_stack' Message: data is not compressed. Consider enabling compression when writing a dataset.

1.11 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_experimenter_exists - 'NWBFile' object at location '/' Message: Experimenter is missing.

1.12 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_experiment_description - 'NWBFile' object at location '/' Message: Experiment description is missing.

1.13 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_keywords - 'NWBFile' object at location '/' Message: Metadata /general/keywords is missing.

1.14 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_column_binary_capability - 'TimeIntervals' object with name 'trials' Message: Column 'blanksweep' uses 'float32' but has binary values [0. 1.]. Consider making it boolean instead and renaming the column to start with 'is'; doing so will save 1.88KB.

1.15 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_column_binarycapability - 'PlaneSegmentation' object with name 'PlaneSegmentation' Message: Column 'Accepted' uses 'integers' but has binary values [0 1]. Consider making it boolean instead and renaming the column to start with 'is'; doing so will save 13.02KB.

1.16 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_column_binarycapability - 'PlaneSegmentation' object with name 'PlaneSegmentation' Message: Column 'Rejected' uses 'integers' but has binary values [0 1]. Consider making it boolean instead and renaming the column to start with 'is'; doing so will save 13.02KB.

(nwb) jerome.lecoq@OSXLTCYGQCV upload % cd 000459
(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % dandi organize ../to_upload
2023-04-05 13:51:11,061 [ WARNING] A newer version (0.52.0) of dandi/dandi-cli is available. You are using 0.51.0 2023-04-05 13:51:11,490 [ INFO] NumExpr defaulting to 8 threads. 2023-04-05 13:51:12,251 [ INFO] Loading metadata from 1 files [Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 2.6s [Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 2.6s finished 2023-04-05 13:51:14,851 [ WARNING] Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb 2023-04-05 13:51:14,851 [ INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405205110Z-75206.log Error: 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb

jeromelecoq commented 1 year ago

It looks like nwbinspector is not catching the validation issue?

CodyCBakerPhD commented 1 year ago

Can you try dandi validate --ignore DANDI.NO_DANDISET_FOUND <source_folder> before dandi organize?

They've been adding more content beyond the Inspector lately; also could you share the log file /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405205110Z-75206.log? Maybe it has a clue as to 'what' metadata that might be

jeromelecoq commented 1 year ago

I am not sure that works as intended

(nwb) jerome.lecoq@OSXLTCYGQCV upload % cd 000459 
(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % dandi validate --ignore DANDI.NO_DANDISET_FOUND ../to_upload 
2023-04-05 15:32:25,088 [    INFO] NumExpr defaulting to 8 threads.
2023-04-05 15:32:28,767 [    INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405223222Z-76404.log
Error: Path '../to_upload' is not inside Dandiset path '/Users/jerome.lecoq/Documents/Work documents/Allen Institute/Projects/DendriticColumns/dandiset/upload/000459'
(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % dandi organize ../to_upload                                 
2023-04-05 15:35:12,506 [    INFO] NumExpr defaulting to 8 threads.
2023-04-05 15:35:13,308 [    INFO] Loading metadata from 1 files
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    2.6s
[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:    2.6s finished
2023-04-05 15:35:15,898 [ WARNING] Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05 15:35:15,899 [    INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405223511Z-76664.log
Error: 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % dandi validate --ignore DANDI.NO_DANDISET_FOUND ../to_upload
2023-04-05 15:35:22,661 [    INFO] NumExpr defaulting to 8 threads.
2023-04-05 15:35:23,445 [    INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405223521Z-76685.log
Error: Path '../to_upload' is not inside Dandiset path '/Users/jerome.lecoq/Documents/Work documents/Allen Institute/Projects/DendriticColumns/dandiset/upload/000459'
jeromelecoq commented 1 year ago

Here is the content of the log:

2023-04-05T13:51:10-0700 [INFO    ] dandi 75206:4308206976 dandi v0.51.0, hdmf v3.5.2, pynwb v2.3.1, h5py v3.7.0
2023-04-05T13:51:10-0700 [INFO    ] dandi 75206:4308206976 sys.argv = ['/Users/jerome.lecoq/opt/miniconda3/envs/nwb/bin/dandi', 'organize', '../to_upload']
2023-04-05T13:51:10-0700 [INFO    ] dandi 75206:4308206976 os.getcwd() = /Users/jerome.lecoq/Documents/Work documents/Allen Institute/Projects/DendriticColumns/dandiset/upload/000459
2023-04-05T13:51:10-0700 [DEBUG   ] urllib3.connectionpool 75206:4308206976 Starting new HTTPS connection (1): rig.mit.edu:443
2023-04-05T13:51:11-0700 [DEBUG   ] urllib3.connectionpool 75206:4308206976 https://rig.mit.edu:443 "GET /et/projects/dandi/dandi-cli HTTP/1.1" 200 579
2023-04-05T13:51:11-0700 [WARNING ] dandi 75206:4308206976 A newer version (0.52.0) of dandi/dandi-cli is available. You are using 0.51.0
2023-04-05T13:51:11-0700 [DEBUG   ] h5py._conv 75206:4308206976 Creating converter from 7 to 5
2023-04-05T13:51:11-0700 [DEBUG   ] h5py._conv 75206:4308206976 Creating converter from 5 to 7
2023-04-05T13:51:11-0700 [DEBUG   ] h5py._conv 75206:4308206976 Creating converter from 7 to 5
2023-04-05T13:51:11-0700 [DEBUG   ] h5py._conv 75206:4308206976 Creating converter from 5 to 7
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'zlib'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'gzip'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'bz2'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'lzma'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'blosc'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'zstd'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'lz4'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'astype'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'delta'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'quantize'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'fixedscaleoffset'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'packbits'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'categorize'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'pickle'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'base64'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'shuffle'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'bitround'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'msgpack2'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'crc32'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'adler32'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'json2'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'vlen-utf8'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'vlen-bytes'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'vlen-array'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'n5_wrapper'
2023-04-05T13:51:11-0700 [INFO    ] numexpr.utils 75206:4308206976 NumExpr defaulting to 8 threads.
2023-04-05T13:51:12-0700 [INFO    ] dandi 75206:4308206976 Loading metadata from 1 files
2023-04-05T13:51:14-0700 [WARNING ] dandi 75206:4308206976 Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05T13:51:14-0700 [DEBUG   ] dandi 75206:4308206976 Caught exception 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05T13:51:14-0700 [INFO    ] dandi 75206:4308206976 Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405205110Z-75206.log
CodyCBakerPhD commented 1 year ago

Last idea of mine: 2023-04-05T13:51:11-0700 [WARNING ] dandi 75206:4308206976 A newer version (0.52.0) of dandi/dandi-cli is available. You are using 0.51.0

Try upgrading with pip install -U dandi and retrying?

Otherwise I defer to @yarikoptic on what looks to be bugs on the DANDI CLI side of things

jeromelecoq commented 1 year ago

Ah yes, I upgraded after that run same

yarikoptic commented 1 year ago

so the heart of the problem is the message(s) from dandi organize

2023-04-05 13:51:14,851 [ WARNING] Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05 13:51:14,851 [ INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405205110Z-75206.log
Error: 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb

correct? I guess we might improve the message there. What it means that the file contains no fields of interest for organize, not even the object_id. There was a recent issue filed for that https://github.com/dandi/dandi-cli/issues/1266 -- may be situation described there rings a bell?

you could use dandi ls on those files to see all metadata we load, and something like this

find pynwb -iname *.nwb | while read p; do echo $p; python -c 'import sys,yaml; from dandi.pynwb_utils import get_object_id; print(get_object_id(sys.argv[1]));' $p; done

to go through your .nwb files and print their object_ids (we might want to add printing object_id by dandi ls).

Then you can see which metadata fields are used by organize to construct filenames here https://github.com/dandi/dandi-cli/blob/HEAD/dandi/consts.py#L177 . Current explanation is that among those fields there were no information to name that file. What did you expect it to get named like?

jeromelecoq commented 1 year ago

I ran the script you suggested :

(nwb) jerome.lecoq@OSXLTCYGQCV to_upload % find . -iname *.nwb | while read p; do echo $p; python -c 'import sys,yaml; from dandi.pynwb_utils import get_object_id; print(get_object_id(sys.argv[1]));' $p; done
./Rorb-IRES2-Cre_590168381_590168385.nwb
86fb9251-666e-4ac6-b246-ed7e2747c238
jeromelecoq commented 1 year ago

To note, these are NWB files that I created by merging the output of suite2p+NeuroConv with data from Allen Institute Visual coding NWB 1.0 files. It looks like maybe some metadata needs to move around.

jeromelecoq commented 1 year ago

I am not entirely sure what is missing.

Here is output of pynwb on this file :

input_nwb root pynwb.file.NWBFile at 0x5050990496 Fields: devices: { 2-photon microscope <class 'pynwb.device.Device'>, Microscope <class 'pynwb.device.Device'>, display monitor <class 'pynwb.device.Device'>, eye-tracking camera <class 'pynwb.device.Device'> } file_create_date: [datetime.datetime(2023, 3, 18, 16, 18, 27, 551472, tzinfo=tzutc())] identifier: 64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs imaging_planes: { ImagingPlane <class 'pynwb.ophys.ImagingPlane'> } institution: Allen Institute for Brain Science intervals: { trials <class 'pynwb.epoch.TimeIntervals'> } processing: { behavior <class 'pynwb.base.ProcessingModule'>, ophys <class 'pynwb.base.ProcessingModule'> } session_description: no description session_id: 590168385 session_start_time: 2020-01-01 12:30:00-08:00 stimulus: { natural_movie_one_stimulus <class 'pynwb.image.IndexSeries'>, natural_movie_one_stimulus_frame_duration <class 'pynwb.image.ImageSeries'>, natural_movie_three_stimulus <class 'pynwb.image.IndexSeries'>, natural_movie_three_stimulus_frame_duration <class 'pynwb.image.ImageSeries'>, spontaneous_stimulus <class 'pynwb.misc.IntervalSeries'> } stimulus_template: { natural_movie_one_image_stack <class 'pynwb.image.OpticalSeries'>, natural_movie_three_image_stack <class 'pynwb.image.OpticalSeries'> } subject: subject pynwb.file.Subject at 0x5050983488 Fields: age: P113D age__reference: birth description: Mus musculus in vivo genotype: Rorb-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt sex: M species: Mus musculus subject_id: 575296278

timestamps_reference_time: 2020-01-01 12:30:00-08:00 trials: trials <class 'pynwb.epoch.TimeIntervals'>

jeromelecoq commented 1 year ago

We are not sure which metadata is missing. Ahad and I were wondering if something else was crashing organize.

See here for an example of these files : https://www.dropbox.com/s/qwv4i2zh0un4v9d/Rorb-IRES2-Cre_590168381_590168385.nwb?dl=0

CodyCBakerPhD commented 1 year ago

We are not sure which metadata is missing.

From the printout of your NWB file, it looks like you ought to have everything DANDI currently requires (at least to my knowledge). Thanks for including that

Ahad and I were wondering if something else was crashing organize.

That is my best guess now as well

jeromelecoq commented 1 year ago

If that is helpful, I am comparing the content of this NWB files with another file that dandi organize actually like and is already on Dandi.

WORKS:

Fields:
  devices: {
    2p_microscope <class 'pynwb.device.Device'>
  }
  file_create_date: [datetime.datetime(2022, 9, 25, 4, 53, 2, 714938, tzinfo=tzoffset(None, -25200))]
  identifier: 758519303
  imaging_planes: {
    ImagingPlane <class 'pynwb.ophys.ImagingPlane'>
  }
  institution: Allen Institute for Brain Science
  intervals: {
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  processing: {
    behavior <class 'pynwb.base.ProcessingModule'>,
    ophys <class 'pynwb.base.ProcessingModule'>
  }
  session_description: Allen Institute OpenScope dataset
  session_id: 758519303
  session_start_time: 2018-09-26 17:29:17.502000-07:00
  subject: subject pynwb.file.Subject at 0x5192267424
Fields:
  age: P95D
  genotype: Cux2-CreERT2;Camk2a-tTA;Ai93
  sex: M
  species: Mus musculus
  subject_id: 408021

  timestamps_reference_time: 2018-09-26 17:29:17.502000-07:00
  trials: trials <class 'pynwb.epoch.TimeIntervals'>

DOES NOT WORK

Fields:
  devices: {
    2-photon microscope <class 'pynwb.device.Device'>,
    Microscope <class 'pynwb.device.Device'>,
    display monitor <class 'pynwb.device.Device'>,
    eye-tracking camera <class 'pynwb.device.Device'>
  }
  file_create_date: [datetime.datetime(2023, 3, 18, 16, 18, 27, 551472, tzinfo=tzutc())]
  identifier: 64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs
  imaging_planes: {
    ImagingPlane <class 'pynwb.ophys.ImagingPlane'>
  }
  institution: Allen Institute for Brain Science
  intervals: {
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  processing: {
    behavior <class 'pynwb.base.ProcessingModule'>,
    ophys <class 'pynwb.base.ProcessingModule'>
  }
  session_description: no description
  session_id: 590168385
  session_start_time: 2020-01-01 12:30:00-08:00
  stimulus: {
    natural_movie_one_stimulus <class 'pynwb.image.IndexSeries'>,
    natural_movie_one_stimulus_frame_duration <class 'pynwb.image.ImageSeries'>,
    natural_movie_three_stimulus <class 'pynwb.image.IndexSeries'>,
    natural_movie_three_stimulus_frame_duration <class 'pynwb.image.ImageSeries'>,
    spontaneous_stimulus <class 'pynwb.misc.IntervalSeries'>
  }
  stimulus_template: {
    natural_movie_one_image_stack <class 'pynwb.image.OpticalSeries'>,
    natural_movie_three_image_stack <class 'pynwb.image.OpticalSeries'>
  }
  subject: subject pynwb.file.Subject at 0x5194424160
Fields:
  age: P113D
  age__reference: birth
  description: Mus musculus in vivo
  genotype: Rorb-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt
  sex: M
  species: Mus musculus
  subject_id: 575296278

  timestamps_reference_time: 2020-01-01 12:30:00-08:00
  trials: trials <class 'pynwb.epoch.TimeIntervals'>
jeromelecoq commented 1 year ago

Could it be fields that should NOT be there?

CodyCBakerPhD commented 1 year ago

Could it be fields that should NOT be there?

Doubtful, when it comes to metadata the more information than can be included the better, so as such I don't believe there are any 'forbidden' contents

Something I did just notice is the underscores in the identifier='64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs' @yarikoptic would that cause a problem do you think?

jeromelecoq commented 1 year ago

I just remove most of the identifier:

path = 'Rorb-IRES2-Cre_590168381_590168385.nwb' import h5py X = h5py.File(path, 'r+') X.keys() <KeysViewHDF5 ['acquisition', 'analysis', 'file_create_date', 'general', 'identifier', 'intervals', 'processing', 'session_description', 'session_start_time', 'specifications', 'stimulus', 'timestamps_reference_time']> X['identifier'] <HDF5 dataset "identifier": shape (), type "|O"> X['identifier'] <HDF5 dataset "identifier": shape (), type "|O"> X['identifier'][()] b'64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs' local_data = X['identifier'][()] local_data[0:8] b'64ae8bcc' X['identifier'][()] = local_data[0:8] X['identifier'][()] b'64ae8bcc' X.close()

jeromelecoq commented 1 year ago

same error

satra commented 1 year ago

@jwodder and @yarikoptic - this section of the reader is resulting in an error - perhaps that results in the issue @jeromelecoq is seeing:

using _get_pynwb_metadata("/Users/satra/Downloads/Rorb-IRES2-Cre_590168381_590168385.nwb") in pynwb_utils

File ~/software/dandi/dandi-cli/dandi/pynwb_utils.py:210, in _get_pynwb_metadata(path)
    206 out = {}
    207 with open_readable(path) as fp, h5py.File(fp) as h5, NWBHDF5IO(
    208     file=h5, load_namespaces=True
    209 ) as io:
--> 210     nwb = io.read()
    211     for key in metadata_nwb_file_fields:
    212         value = getattr(nwb, key)

results in:

ConstructError: (root/stimulus/presentation/natural_movie_one_stimulus GroupBuilder {'attributes': {'comments': 'The data stored here is a precursor for what was displayed. Please see http://help.brain-map.org/download/attachments/10616846/VisualCoding_VisualStimuli.pdf for instructions for how to convert this to actual stimulus data', 'description': 'natural_movie_one_stimulus', 'namespace': 'core', 'neurodata_type': 'IndexSeries', 'object_id': '42360a35-1bd0-4f36-b9cc-0dc461ad4438'}, 'groups': {}, 'datasets': {'data': root/stimulus/presentation/natural_movie_one_stimulus/data DatasetBuilder {'attributes': {'conversion': 1.0, 'offset': 0.0, 'resolution': -1.0, 'unit': 'N/A'}, 'data': <Closed HDF5 dataset>}, 'timestamps': root/stimulus/presentation/natural_movie_one_stimulus/timestamps DatasetBuilder {'attributes': {'interval': 1, 'unit': 'seconds'}, 'data': <Closed HDF5 dataset>}}, 'links': {}}, 'Could not construct IndexSeries object due to: Either indexed_timeseries or indexed_images must be provided when creating an IndexSeries.')

where as this works just fine:

In [17]: with NWBHDF5IO("path_to_file.nwb", load_namespaces=True) as io
    ...: :
    ...:     nwb = io.read()
    ...: 
jeromelecoq commented 1 year ago

Ah that seems like it. Yes. I tested the io.read() but not the thing above. We just need to find the key it crashes on?

yarikoptic commented 1 year ago

Thanks for digging!

hm, I have tried to reproduce while incrementally building up how I open it

$> cat Rorb-IRES2-Cre_590168381_590168385.py
from pynwb import NWBHDF5IO
import h5py
from dandi.pynwb_utils import open_readable

fname = "Rorb-IRES2-Cre_590168381_590168385.nwb"

with NWBHDF5IO(fname, load_namespaces=True) as io:
      nwb = io.read()
print("way 1 worked")

with open(fname, 'rb') as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io:
      nwb = io.read()
print("way 2 worked")

with open_readable(fname) as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io:
      nwb = io.read()
print("way 3 worked")

and they all worked out

$> python Rorb-IRES2-Cre_590168381_590168385.py
way 1 worked
way 2 worked
way 3 worked
jeromelecoq commented 1 year ago

So I used a variant of this code https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L138

To port visual stimuli object in an NWB 1 files to a newly created NWB 2.0 files.

What exactly is the sub-object that crashes?

jeromelecoq commented 1 year ago

I think those index_timeseries were provided here : https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L184

yarikoptic commented 1 year ago
2023-04-06 19:18:32,451 [    INFO] Loading metadata from 1 files
2023-04-06 19:18:32,579 [   DEBUG] Failed to get metadata for ../Rorb-IRES2-Cre_590168381_590168385.nwb: NWB files with external 
links are not supported: /home/yoh/proj/dandi/nwb-files/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-06 19:18:32,580 [ WARNING] Failed to load metadata for 1 out of 1 files

which is due to https://github.com/dandi/dandi-cli/blob/HEAD/dandi/metadata.py#L110 which was added in https://github.com/dandi/dandi-cli/pull/843 to "address" https://github.com/dandi/dandi-cli/issues/840 .

If my analysis is right, the "solution" here might be

jeromelecoq commented 1 year ago

Can you clarify how I can address the error? Should I remove external links?

satra commented 1 year ago

@yarikoptic - perhaps its a version thing. in a fresh mamba environment on my m1 tin can:

mamba create -n testnwb ipython pip python=3.10
mamba activate testnwb 
pip install dandi

and then

from dandi.metadata import _get_pynwb_metadata
_get_pynwb_metadata("Rorb-IRES2-Cre_590168381_590168385.nwb")

the error (which points to the links as well i think):

ConstructError: (root/stimulus/presentation/natural_movie_one_stimulus GroupBuilder {'attributes': {'comments': 'The data stored here is a precursor for what was displayed. Please see http://help.brain-map.org/download/attachments/10616846/VisualCoding_VisualStimuli.pdf for instructions for how to convert this to actual stimulus data', 'description': 'natural_movie_one_stimulus', 'namespace': 'core', 'neurodata_type': 'IndexSeries', 'object_id': '42360a35-1bd0-4f36-b9cc-0dc461ad4438'}, 'groups': {}, 'datasets': {'data': root/stimulus/presentation/natural_movie_one_stimulus/data DatasetBuilder {'attributes': {'conversion': 1.0, 'offset': 0.0, 'resolution': -1.0, 'unit': 'N/A'}, 'data': }, 'timestamps': root/stimulus/presentation/natural_movie_one_stimulus/timestamps DatasetBuilder {'attributes': {'interval': 1, 'unit': 'seconds'}, 'data': }}, 'links': {}}, 'Could not construct IndexSeries object due to: Either indexed_timeseries or indexed_images must be provided when creating an IndexSeries.')

some relevant bits:

dandi                     0.52.0                   pypi_0    pypi
h5py                      3.8.0                    pypi_0    pypi
hdmf                      3.5.2                    pypi_0    pypi
pynwb                     2.3.1                    pypi_0    pypi
satra commented 1 year ago

@jeromelecoq - this may help: https://www.dandiarchive.org/2022/03/03/external-links-organize.html (perhaps @CodyCBakerPhD could say if its still up to date)

jeromelecoq commented 1 year ago

I am not sure why there are external links with the movies. I can access the raw data directly. It looks like the raw movie is in the template.

jeromelecoq commented 1 year ago

I can't seem to replicate

>>> from dandi.metadata import _get_pynwb_metadata
>>> path
'Rorb-IRES2-Cre_590168381_590168385.nwb'
>>> _get_pynwb_metadata(path)
{'experiment_description': None, 'experimenter': None, 'identifier': '64ae8bcc', 'institution': 'Allen Institute for Brain Science', 'keywords': None, 'lab': None, 'related_publications': None, 'session_description': 'no description', 'session_id': '590168385', 'session_start_time': datetime.datetime(2020, 1, 1, 12, 30, tzinfo=tzoffset(None, -28800)), 'age': 'P113D', 'date_of_birth': None, 'genotype': 'Rorb-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt', 'sex': 'M', 'species': 'Mus musculus', 'subject_id': '575296278', 'number_of_electrodes': 0, 'number_of_units': 0, 'external_file_objects': []}
jeromelecoq commented 1 year ago

pynwb.version '2.3.1' dandi.version '0.52.0' hdmf.version '3.5.2' pynwb.version '2.3.1'

jeromelecoq commented 1 year ago

jerome.lecoq@OSXLTCYGQCV to_upload % python Python 3.10.8

jeromelecoq commented 1 year ago

So I am not sure how it happened so far but I have an external link to dataset in the same file ...

X.get('/stimulus/presentation/natural_movie_one_stimulus/indexed_timeseries', getlink=True) <ExternalLink to "/stimulus/templates/natural_movie_one_image_stack" in file "Rorb-IRES2-Cre_590168381_590168385.nwb"

X.get('/stimulus/templates/natural_movie_one_image_stack', getlink=True) <h5py._hl.group.HardLink at 0x114e22350>

satra commented 1 year ago

thanks @jeromelecoq - suggests something else on my machine. still trying to get clean read.

jeromelecoq commented 1 year ago

It does seem that this is related to ExternalLinks.

This link is between datasets in the same file. this link was created by this code line : https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L184

To connect a template with a presentation.

Is that the wrong way to do this?

jeromelecoq commented 1 year ago

Is it possible that the error is because the template is stored as an OpticalSeries? https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L163

Ryan discuss it in the comment above : https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L142

@yarikoptic @satra ?

satra commented 1 year ago

@jeromelecoq - when did ryan make that suggestion? perhaps the pynwb bug is fixed now and you can go to addressing the best practice violation suggested in your original post?

@yarikoptic and @jeromelecoq - i can't reproduce the contexterror on a separate linux machine, but i can on my m1 mac both natively and using a docker container. and it's interesting that the error points to the same relevant section of code. all coincidence perhaps.

jeromelecoq commented 1 year ago

@jeromelecoq - when did ryan make that suggestion? perhaps the pynwb bug is fixed now and you can go to addressing the best practice violation suggested in your original post?

@yarikoptic and @jeromelecoq - i can't reproduce the contexterror on a separate linux machine, but i can on my m1 mac both natively and using a docker container. and it's interesting that the error points to the same relevant section of code. all coincidence perhaps.

I completely changed the way the natural_movie template is added and used Images object, per Satra suggestion. The same error occurs. So this is ruled out. Here is the newer file. https://www.dropbox.com/s/i708rcvel1r5lwb/Rorb-IRES2-Cre_590168381_590168385-2.nwb?dl=0

Here is copy of cmd:

(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb
2023-04-07 10:26:12,116 [   DEBUG] Starting new HTTPS connection (1): rig.mit.edu:443
2023-04-07 10:26:12,671 [   DEBUG] https://rig.mit.edu:443 "GET /et/projects/dandi/dandi-cli HTTP/1.1" 200 579
2023-04-07 10:26:12,673 [   DEBUG] No newer (than 0.52.0) version of dandi/dandi-cli found available
2023-04-07 10:26:12,940 [   DEBUG] Creating converter from 7 to 5
2023-04-07 10:26:12,940 [   DEBUG] Creating converter from 5 to 7
2023-04-07 10:26:12,940 [   DEBUG] Creating converter from 7 to 5
2023-04-07 10:26:12,940 [   DEBUG] Creating converter from 5 to 7
2023-04-07 10:26:12,986 [   DEBUG] Registering codec 'zlib'
2023-04-07 10:26:12,987 [   DEBUG] Registering codec 'gzip'
2023-04-07 10:26:12,988 [   DEBUG] Registering codec 'bz2'
2023-04-07 10:26:12,988 [   DEBUG] Registering codec 'lzma'
2023-04-07 10:26:12,993 [   DEBUG] Registering codec 'blosc'
2023-04-07 10:26:12,996 [   DEBUG] Registering codec 'zstd'
2023-04-07 10:26:12,997 [   DEBUG] Registering codec 'lz4'
2023-04-07 10:26:12,997 [   DEBUG] Registering codec 'astype'
2023-04-07 10:26:12,998 [   DEBUG] Registering codec 'delta'
2023-04-07 10:26:12,998 [   DEBUG] Registering codec 'quantize'
2023-04-07 10:26:12,998 [   DEBUG] Registering codec 'fixedscaleoffset'
2023-04-07 10:26:12,999 [   DEBUG] Registering codec 'packbits'
2023-04-07 10:26:12,999 [   DEBUG] Registering codec 'categorize'
2023-04-07 10:26:12,999 [   DEBUG] Registering codec 'pickle'
2023-04-07 10:26:13,000 [   DEBUG] Registering codec 'base64'
2023-04-07 10:26:13,001 [   DEBUG] Registering codec 'shuffle'
2023-04-07 10:26:13,001 [   DEBUG] Registering codec 'bitround'
2023-04-07 10:26:13,004 [   DEBUG] Registering codec 'msgpack2'
2023-04-07 10:26:13,004 [   DEBUG] Registering codec 'crc32'
2023-04-07 10:26:13,004 [   DEBUG] Registering codec 'adler32'
2023-04-07 10:26:13,004 [   DEBUG] Registering codec 'json2'
2023-04-07 10:26:13,006 [   DEBUG] Registering codec 'vlen-utf8'
2023-04-07 10:26:13,006 [   DEBUG] Registering codec 'vlen-bytes'
2023-04-07 10:26:13,006 [   DEBUG] Registering codec 'vlen-array'
2023-04-07 10:26:13,030 [   DEBUG] Registering codec 'n5_wrapper'
2023-04-07 10:26:13,115 [    INFO] NumExpr defaulting to 8 threads.
2023-04-07 10:26:13,873 [    INFO] Loading metadata from 1 files
2023-04-07 10:26:14,008 [   DEBUG] Failed to get metadata for ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb: NWB files with external links are not supported: /Users/jerome.lecoq/Documents/Work documents/Allen Institute/Projects/DendriticColumns/dandiset/upload/to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb
2023-04-07 10:26:14,008 [ WARNING] Failed to load metadata for 1 out of 1 files
2023-04-07 10:26:14,008 [ WARNING] Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb
2023-04-07 10:26:14,008 [   DEBUG] Caught exception 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb
2023-04-07 10:26:14,008 [    INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230407172611Z-96788.log
Traceback (most recent call last):
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/bin/dandi", line 8, in <module>
    sys.exit(main())
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/decorators.py", line 38, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/dandi/cli/base.py", line 102, in wrapper
    return f(*args, **kwargs)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/dandi/cli/cmd_organize.py", line 109, in organize
    organize(
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/dandi/organize.py", line 842, in organize
    raise ValueError(msg)
ValueError: 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb

I am a very unclear as to what is going on. Should we loop in Ryan here?

Ahad-Allen commented 1 year ago

Hi all, I am having the same type of error as jerome

Completely empty record for /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336/sub-621602/sub-621602_ophys.nwb
Traceback (most recent call last):
  File "dandi_uploads.py", line 117, in <module>
    automatic_dandi_upload(nwb_folder_path = r'/allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb', dandiset_id = '000336', session_id=r'1193555033', experiment_id = '1193675753', subject_id='621602')
  File "dandi_uploads.py", line 89, in automatic_dandi_upload
    dandi_organize(paths=str(directory_path), dandiset_path=str(dandi_path_set))
  File "/allen/programs/mindscope/workgroups/openscope/ahad/Conda_env/long_nwb/lib/python3.8/site-packages/dandi/organize.py", line 842, in organize
    raise ValueError(msg)
ValueError: 2 out of 2 files were found not containing all necessary metadata: /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/1193675750raw_data.nwb, /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336/sub-621602/sub-621602_ophys.nwb
(/allen/programs/mindscope/workgroups/openscope/ahad/Conda_env/long_nwb) [ahad.bawany@ibs-ahadb-vm1 scripts]$ python dandi_uploads.py 
PATHS:  /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033 /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033 /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   2 | elapsed:    7.7s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   2 out of   2 | elapsed:    7.7s finished
Completely empty record for /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/1193675750raw_data.nwb
Completely empty record for /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336/sub-621602/sub-621602_ophys.nwb
Traceback (most recent call last):
  File "dandi_uploads.py", line 117, in <module>
    automatic_dandi_upload(nwb_folder_path = r'/allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb', dandiset_id = '000336', session_id=r'1193555033', experiment_id = '1193675753', subject_id='621602')
  File "dandi_uploads.py", line 89, in automatic_dandi_upload
    dandi_organize(paths=str(directory_path), dandiset_path=str(dandi_path_set))
  File "/allen/programs/mindscope/workgroups/openscope/ahad/Conda_env/long_nwb/lib/python3.8/site-packages/dandi/organize.py", line 842, in organize
    raise ValueError(msg)
ValueError: 2 out of 2 files were found not containing all necessary metadata: /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/1193675750raw_data.nwb, /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336/sub-621602/sub-621602_ophys.nwb

with different nwb files. These files are recgenerations of files that were already on dandi and have passed dandi validation in the past, with the only difference being that the subject_id in the subject field has changed. One important element to note is that I upgraded from dandi version 0.48.1 to the latest version before attempting these uploads. I have attached a copy of the file here: https://drive.google.com/file/d/1WCzmOd-V3KtAiy1uN4LB-_ShT1yeoxcD/view?usp=sharing

yarikoptic commented 1 year ago

@Ahad-Allen following above discussion -- do you know if files include external links?

edit: ignore -- as I showed below, it does not

you can possibly get to the original exception and warnings (which might warn about external links) via running it as DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug ....

yarikoptic commented 1 year ago

some relevant bits:

dandi                     0.52.0                   pypi_0    pypi
h5py                      3.8.0                    pypi_0    pypi
hdmf                      3.5.2                    pypi_0    pypi
pynwb                     2.3.1                    pypi_0    pypi
using this script -- those modules versions seems to be the same ```python from pynwb import NWBHDF5IO from dandi.consts import metadata_nwb_file_fields from dandi.pynwb_utils import open_readable from dandi.pynwb_utils import nwb_has_external_links import sys def load(io): nwb = io.read() for key in metadata_nwb_file_fields: value = getattr(nwb, key) import pkg_resources import dandi, h5py, hdmf, pynwb for m in dandi, h5py, hdmf, pynwb: print(pkg_resources.get_distribution(m.__name__)) for fname in sys.argv[1:]: print(f"{fname} has links: {nwb_has_external_links(fname)}") with NWBHDF5IO(fname, load_namespaces=True) as io: load(io) print("way 1 worked") with open(fname, 'rb') as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io: load(io) print("way 2 worked") with open_readable(fname) as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io: load(io) print("way 3 worked") from dandi.metadata import _get_pynwb_metadata print(_get_pynwb_metadata(fname)) ```
$> DANDI_CACHE=ignore python test_on_nwb.py Rorb-IRES2-Cre_590168381_590168385.nwb
dandi 0.52.0
h5py 3.8.0
hdmf 3.5.2
pynwb 2.3.1
Rorb-IRES2-Cre_590168381_590168385.nwb has links: True
way 1 worked
way 2 worked
way 3 worked
{'experiment_description': None, 'experimenter': None, 'identifier': '64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs', 'institution': 'Allen Institute for Brain Science', 'keywords': None, 'lab': None, 'related_publications': None, 'session_description': 'no description', 'session_id': '590168385', 'session_start_time': datetime.datetime(2020, 1, 1, 12, 30, tzinfo=tzoffset(None, -28800)), 'age': 'P113D', 'date_of_birth': None, 'genotype': 'Rorb-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt', 'sex': 'M', 'species': 'Mus musculus', 'subject_id': '575296278', 'number_of_electrodes': 0, 'number_of_units': 0, 'external_file_objects': []}

and on file from @Ahad-Allen

$> DANDI_CACHE=ignore python test_on_nwb.py 1193675750raw_data.nwb                
dandi 0.52.0
h5py 3.8.0
hdmf 3.5.2
pynwb 2.3.1
1193675750raw_data.nwb has links: False
/home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.9/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.5.0 because version 1.5.1 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.9/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'core' version 2.3.0 because version 2.6.0-alpha is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.9/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-experimental' version 0.1.0 because version 0.2.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
way 1 worked
way 2 worked
way 3 worked
{'experiment_description': 'ophys session', 'experimenter': None, 'identifier': '1193675750', 'institution': 'Allen Institute for Brain Science', 'keywords': ['2-photon', 'calcium imaging', 'visual cortex', 'behavior', 'task'], 'lab': None, 'related_publications': None, 'session_description': 'Ophys Session', 'session_id': None, 'session_start_time': datetime.datetime(2022, 7, 22, 12, 7, 33, 412000, tzinfo=tzutc()), 'age': 'P161.0D', 'date_of_birth': None, 'genotype': 'Rbp4-Cre_KL100/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt', 'sex': 'F', 'species': 'Mus musculus', 'subject_id': '621602', 'number_of_electrodes': 0, 'number_of_units': 0, 'external_file_objects': []}
DANDI_CACHE=ignore python test_on_nwb.py 1193675750raw_data.nwb  37.75s user 0.88s system 103% cpu 37.303 total

so also works -- I guess difference in some other version detail.

edit: on that box I use simple virtualenv with system wide python 3.9

yarikoptic commented 1 year ago
and running organize on the file from @Ahad-Allen worked for me ```shell smaug:~/proj/dandi/nwb-files/000027 $> DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug ../1193675750raw_data.nwb ... 2023-04-07 15:55:45,114 [ INFO] Symlink support autodetected; setting files_mode='symlink' 2023-04-07 15:55:45,118 [ DEBUG] Assigned 1 session_id's based on the date 2023-04-07 15:55:45,119 [ INFO] Organized 1 paths. Visit /home/yoh/proj/dandi/nwb-files/000027/ 2023-04-07 15:55:45,119 [ INFO] Logs saved in /home/yoh/.cache/dandi-cli/log/20230407195534Z-3648741.log DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug 11.94s user 0.83s system 108% cpu 11.736 total (dev3) 1 10233.....................................:Fri 07 Apr 2023 03:55:45 PM EDT:. smaug:~/proj/dandi/nwb-files/000027 $> ls -l /home/yoh/proj/dandi/nwb-files/000027/sub-621602/sub-621602_ophys.nwb lrwxrwxrwx 1 yoh yoh 53 Apr 7 15:55 /home/yoh/proj/dandi/nwb-files/000027/sub-621602/sub-621602_ophys.nwb -> /home/yoh/proj/dandi/nwb-files/1193675750raw_data.nwb ```
bendichter commented 1 year ago

It looks like these files were created with non-standard means. Without more detailed reported from dandi-cli, it's going to be difficult to know how to resolve.

jeromelecoq commented 1 year ago

Hi @bendichter, well, I am not entirely sure to what extent this is out of a normal workflow.

1/ I used suite2p to segment a movie. 2/ I used NeuroConv to make the first NWB from suite2p output. 3/ I loaded NWB 1.0 files from the Allen 4/ I added objects to the NeuroConv NWB 2.0 output, by copying values from the NWB 1.0 file.

jeromelecoq commented 1 year ago

I was able to nailed down the issue further. The problem is the IndexSeries object when it receives a index_timeseries as parameter to register the associated template. This end-up creating an NWB file with external file link. Perhaps the problem is in pynwb upon creating. If I convert to a TimeSeries, removing the link to the template. It all works.

jeromelecoq commented 1 year ago

I believe this code here : https://pynwb.readthedocs.io/en/stable/tutorials/domain/brain_observatory.html

Would not work as a result. In particular this part :

for stimulus in stimulus_list:
    visual_stimulus_images = ImageSeries(
        name=stimulus,
        data=dataset.get_stimulus_template(stimulus),
        unit='NA',
        format='raw',
        timestamps=[0.0])
    image_index = IndexSeries(
        name=stimulus,
        data=dataset.get_stimulus_table(stimulus).frame.values,
        unit='NA',
        indexed_timeseries=visual_stimulus_images,
        timestamps=timestamps[dataset.get_stimulus_table(stimulus).start.values])
    nwbfile.add_stimulus_template(visual_stimulus_images)
    nwbfile.add_stimulus(image_index)

The problem is that indexed_timeseries link which causes dandi to have issues.

satra commented 1 year ago

i'm going to bring @rly into this conversation. the summary of this issue is that certain operations lead to external links being created, which are not external links (as in don't point to files outside we think) and that's triggering dandi cli to complain.

@jeromelecoq - just a random thought, is it possible that some part of the step is still pointing to data/data array in the nwb 1 file? i.e. still maintains a reference hence treated as an external link?

jeromelecoq commented 1 year ago

Yes I think a link is created causing the issue. I played around with the dataset properties and it seems like the link is to current file itself like self referenced link I can explore more tonight

On Sun, Apr 9, 2023 at 2:33 PM Satrajit Ghosh @.***> wrote:

i'm going to bring @rly https://github.com/rly into this conversation. the summary of this issue is that certain operations lead to external links being created, which are not external links (as in don't point to files outside we think) and that's triggering dandi cli to complain.

@jeromelecoq https://github.com/jeromelecoq - just a random thought, is it possible that some part of the step is still pointing to data/data array in the nwb 1 file? i.e. still maintains a reference hence treated as an external link?

— Reply to this email directly, view it on GitHub https://github.com/dandi/dandi-cli/issues/1270#issuecomment-1501218357, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATAHTYGZW7OVGJLGNH4JGLXAMTJTANCNFSM6AAAAAAWUR6H2A . You are receiving this because you were mentioned.Message ID: @.***>

-- Jérôme

jeromelecoq commented 1 year ago

Using TimeSeries allowed me to move forward and upload a draft of Visual Coding NWB 2.0 to Dandi. This supports the issue is related to links. I am still working on it little things here and there but I will go back to this later on. Obviously my files do not have the template images, just the underlying stimulus structure.

rly commented 1 year ago

@jeromelecoq please try installing this branch of HDMF referenced in https://github.com/hdmf-dev/hdmf/pull/847:

pip uninstall hdmf --yes
pip install git+https://github.com/hdmf-dev/hdmf.git@fix/export_links

And let me know if that resolves the error.