Closed takluyver closed 4 years ago
Thank for this very useful feature. I'm still bummed I only learned of h5glance a month ago, it'd have been soooo useful to me before. I have two small comments:
Apart from that, LGTM.
Yes, good point. I'd been using 'slice expression' to mean both slicing and indexing, but that's not entirely clear.
Do you think we should just disable numpy's summary behaviour entirely (threshold=np.inf
)? The default view of a dataset will only show a maximum of 100 elements anyway, so the only way to trigger it is with this option.
Ah, fair enough. In this case, it probably makes sense to disable it. It's definitely a corner case, but I'd be annoyed if I'd really want to see this entire, way too big dataset and some default numpy settings bar me from it.
Currently,
h5glance file.h5 path/to/dataset
shows a sample of the data - the first 10x10 values, for a dataset with at least 2 dimensions. This allows the user to override that and specify which data to look at:The syntax is a Python/numpy/h5py slice expression - I don't know any more general way to carve up a multidimensional space. There's no limit on the amount of data it will load, though numpy's default printing options will only show up to 1000 elements before it collapses the array with
...
ellipsis.This is meant for getting a close-up view of a small amount of data, e.g. if
extra-data-validate
points out a problem in a specific part of the index. It's not a good way to see a lot of data - for that you're better off using other tools that can plot or summarise the data.