European-XFEL / h5glance

Explore HDF5 files in terminal & HTML views
BSD 3-Clause "New" or "Revised" License
65 stars 7 forks source link

H5Glance copylink does not follow external links #23

Closed t20100 closed 3 years ago

t20100 commented 4 years ago

When using h5glance from a notebook and HDF5 files with external links, the path copied for dataset through the copy icon in the html view is the one in the external linked file rather than the one from the viewed file.

To reproduce:

import numpy
import h5py
from h5glance import H5Glance

with h5py.File("test-h5glance-001.h5", mode="a") as h5file:
    h5file.create_group("/1.1")
    h5file["/1.1/data"] = numpy.arange(100)

with h5py.File("test-h5glance.h5", mode="a") as h5file:
    h5file["/scan: 1.1"] = h5py.ExternalLink("test-h5glance-001.h5", "/1.1")

H5Glance("test-h5glance.h5")

When clicking on the /scan: 1.1/data copy icon, the copied path is /1.1/data, i.e., the path in the linked file and not the one from the file currently browsed. This seems misleading since there is no information about which file the data set is really in.

takluyver commented 4 years ago

IIRC, we just use obj.name from h5py.

Would it make sense for the notebook interface to show external links explicitly, like the terminal interface already does, rather than following them and showing the object they point to?

t20100 commented 4 years ago

Yes, that would make more sense since it avoids the difference between the displayed path and the copied one in case of external links.

An alternative is to provide the path taken from the initial file (i.e., the one displayed in the tree) as the copy link so it is consistent with what is displayed and can be used with the file provided to H5Glance.

takluyver commented 4 years ago

24 should fix this by showing the link, like the terminal view does. This is easier to implement than calculating the path from the initial file. :-)

t20100 commented 4 years ago

Arf.. I started doing the other patch that was about to propose ;), it's here: https://github.com/t20100/h5glance/tree/html-logical-path It looks to work, but it still needs a bit of tests. I can turn it in a PR if you want to go this way. But I'd understand you want to keep the behavior consistent between terminal and html.

takluyver commented 4 years ago

Ah, sorry, I should have communicated clearer.

I would like to make the HTML & terminal interfaces more consistent, though I'm open to discussing what they both do. We don't use external links much at EuXFEL, so I don't have a good feeling for what we want. I'd like to resist making it an option unless it's really necessary, because more options means more room for bugs.

Not following external links also makes it easier to avoid infinite recursion where there are links in a cycle. So far the HTML interface cheerfully ignores this issue, but the terminal interface does try to check 'have I seen this object before?'

t20100 commented 4 years ago

At ESRF external links are used quite a lot: For acquisition, there is a HDF5 "master" file per sample with external links to scans stored in multiple hdf5 files (and there is even a "proposal" master file with external links to all scans).

In this case, using the "logical path" is convenient. The down side is that you can end-up with a HDF5 structure too big to handle at once and so stopping at the external links and providing the filename + HDF5 path as #24 mitigates it.

takluyver commented 4 years ago

Our HDF5 structures also get pretty big - h5ls -r can be a few thousand lines. This works fairly nicely in the terminal, although it can take a second or two to start on such files (I'm playing around with one possible improvement in #22). I've thought about adding options to control the maximum depth or the maximum number of entries shown in a single group, but that's moving away from the really simple, convenient interface we have.

t20100 commented 4 years ago

Regarding file structure size, I encountered here a HDF5 file with more than 5000 very small scans, yet the structure is more than 2 million lines with h5ls -r... which is a problem to display. So in this case, having a depth control would help. At the same time, I'm not sure you want to manually browse it!