European-XFEL / h5glance

Explore HDF5 files in terminal & HTML views
BSD 3-Clause "New" or "Revised" License
68 stars 8 forks source link

Add -s/--slice option to choose visible slice of dataset #16

Closed takluyver closed 4 years ago

takluyver commented 4 years ago

Currently, h5glance file.h5 path/to/dataset shows a sample of the data - the first 10x10 values, for a dataset with at least 2 dimensions. This allows the user to override that and specify which data to look at:

h5glance xmpl-3.cxi entry_1/instrument_1/detector_1/data -s 0,0,10:12,:5

The syntax is a Python/numpy/h5py slice expression - I don't know any more general way to carve up a multidimensional space. There's no limit on the amount of data it will load, though numpy's default printing options will only show up to 1000 elements before it collapses the array with ... ellipsis.

This is meant for getting a close-up view of a small amount of data, e.g. if extra-data-validate points out a problem in a specific part of the index. It's not a good way to see a lot of data - for that you're better off using other tools that can plot or summarise the data.

philsmt commented 4 years ago

Thank for this very useful feature. I'm still bummed I only learned of h5glance a month ago, it'd have been soooo useful to me before. I have two small comments:

Apart from that, LGTM.

takluyver commented 4 years ago

Yes, good point. I'd been using 'slice expression' to mean both slicing and indexing, but that's not entirely clear.

Do you think we should just disable numpy's summary behaviour entirely (threshold=np.inf)? The default view of a dataset will only show a maximum of 100 elements anyway, so the only way to trigger it is with this option.

philsmt commented 4 years ago

Ah, fair enough. In this case, it probably makes sense to disable it. It's definitely a corner case, but I'd be annoyed if I'd really want to see this entire, way too big dataset and some default numpy settings bar me from it.