Add -s/--slice option to choose visible slice of dataset

takluyver commented 4 years ago

Currently, h5glance file.h5 path/to/dataset shows a sample of the data - the first 10x10 values, for a dataset with at least 2 dimensions. This allows the user to override that and specify which data to look at:

h5glance xmpl-3.cxi entry_1/instrument_1/detector_1/data -s 0,0,10:12,:5

The syntax is a Python/numpy/h5py slice expression - I don't know any more general way to carve up a multidimensional space. There's no limit on the amount of data it will load, though numpy's default printing options will only show up to 1000 elements before it collapses the array with ... ellipsis.

This is meant for getting a close-up view of a small amount of data, e.g. if extra-data-validate points out a problem in a specific part of the index. It's not a good way to see a lot of data - for that you're better off using other tools that can plot or summarise the data.

philsmt commented 4 years ago

Thank for this very useful feature. I'm still bummed I only learned of h5glance a month ago, it'd have been soooo useful to me before. I have two small comments:

Due to your implementation via eval(), index lists work as well as slices, i.e. -s [0,2,4]. You might want to add this to the argparse help and/or the test cases.
This option could actually be used to overwrite the default threshold of numpy printing via https://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html in case you really want to do just that. The rationale is that if you specify a given slice, you probably want to actually see the whole slice.

Apart from that, LGTM.

takluyver commented 4 years ago

Yes, good point. I'd been using 'slice expression' to mean both slicing and indexing, but that's not entirely clear.

Do you think we should just disable numpy's summary behaviour entirely (threshold=np.inf)? The default view of a dataset will only show a maximum of 100 elements anyway, so the only way to trigger it is with this option.

philsmt commented 4 years ago

Ah, fair enough. In this case, it probably makes sense to disable it. It's definitely a corner case, but I'd be annoyed if I'd really want to see this entire, way too big dataset and some default numpy settings bar me from it.

European-XFEL / h5glance

Add -s/--slice option to choose visible slice of dataset #16