JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
383 stars 139 forks source link

Is tab completion supportable? [feature request] #1096

Open alex-s-gardner opened 1 year ago

alex-s-gardner commented 1 year ago

Is there a path to HDF5.jl supporting tab completion? It would be very powerful for navigating opened h5 files but I'm guessing there is a good reason that it doesn't already exist.

mkitti commented 1 year ago

Could you elaborate about the context in which you would like to see tab completion? In the REPL? In VSCode? In general for the language server?

Also, when would you press tab? Does another HDF5 or Julia library implement this?

I think this may be possible as a library that uses HDF5.jl.

alex-s-gardner commented 1 year ago

I guess I'm looking for DataFrame-esc tab completion in the REPL using VSCODE:

It would be very helpful if one could open a dataset with h5open then use tab completion to navigate groups

import Downloads
using HDF5

url = "https://github.com/evetion/SpaceLiDAR-artifacts/releases/download/v0.3.0/ATL08_20201121151145_08920913_006_01.h5";

fn = Downloads.download(url)
h5 = h5open(fn);

# it would be nice to be able to use `dot` tab indexing like a structure:
h5.gt1l.land_segments.latitude 

If you open the example file you're quickly realize why tab completion would be very helpful. The structure is so complex show doesn't buy you much.

mkitti commented 1 year ago

For the REPL see https://github.com/JuliaLang/julia/issues/44287

mkitti commented 1 year ago

A crude implementation might look like this:

julia> using HDF5

julia> Base.propertynames(f::T) where T <: Union{HDF5.File, HDF5.Group} = (fieldnames(T)..., Symbol.(keys(f))...)

julia> Base.getproperty(f::T, s::Symbol) where T <: Union{HDF5.File, HDF5.Group} = hasfield(T, s) ? getfield(f, s) : getindex(f, String(s))

julia> h5f = h5open("ATL08_20201121151145_08920913_006_01.h5");

julia> h5f.[TAB]
METADATA            ancillary_data      ds_geosegments      ds_metrics          ds_surf_type        filename            gt1l                gt1r                gt2l                gt2r
gt3l                gt3r                id                  orbit_info          quality_assessment
julia> h5f.gt1l.land_segments.latitude
🔢 HDF5.Dataset: /gt1l/land_segments/latitude (file: ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0)
├─ 🏷️ DIMENSION_LIST
├─ 🏷️ contentType
├─ 🏷️ coordinates
├─ 🏷️ description
├─ 🏷️ long_name
├─ 🏷️ source
├─ 🏷️ standard_name
├─ 🏷️ units
├─ 🏷️ valid_max
└─ 🏷️ valid_min
alex-s-gardner commented 1 year ago

@mkitti this is fantastic.. I just tested it works swimmingly for the first level of groups, more to do to get subgoups. Is there any reason not to include this in base HDF5.jl? If there is no reason to exclude then I'd be happy to work on a PR.

mkitti commented 1 year ago

I would feel a bit more comfortable about this if we did it in a wrapper type dedicated to this purpose.

One issue here is that the field names can interfere with underlying child names.

It would also be good to address the multilevel completion as well.

mkitti commented 1 year ago

Here is version 2. This time I'm just using NamedTuple as my "wrapper type".

Here is the setup:

using HDF5
h5_to_tuple(h5o::Union{HDF5.File,HDF5.Group}) = 
    NamedTuple(Symbol.(keys(h5o)) .=> h5_to_tuple.(getindex.((h5o,), keys(h5o))))
h5_to_tuple(h5o) = h5o
h5f = h5open("ATL08_20201121151145_08920913_006_01.h5");
nt = h5_to_tuple(h5f);

Here is an interactive session showing multilevel tab completion behavior:

julia> nt.[TAB]

METADATA            ancillary_data      ds_geosegments      ds_metrics          ds_surf_type        gt1l                gt1r
gt2l                gt2r                gt3l                gt3r                orbit_info          quality_assessment
julia> nt.gt2l.[TAB]

land_segments   signal_photons
julia> nt.gt2l.land_segments.[TAB]

asr                atlas_pa           beam_azimuth       beam_coelev        brightness_flag    canopy             cloud_flag_atm
cloud_fold_flag    delta_time         delta_time_beg     delta_time_end     dem_flag           dem_h              dem_removal_flag
h_dif_ref          last_seg_extend    latitude           latitude_20m       layer_flag         longitude          longitude_20m
msw_flag           n_seg_ph           night_flag         ph_ndx_beg         ph_removal_flag    psf_flag           rgt
sat_flag           segment_id_beg     segment_id_end     segment_landcover  segment_snowcover  segment_watermask  sigma_across
sigma_along        sigma_atlas_land   sigma_h            sigma_topo         snr                solar_azimuth      solar_elevation
surf_type          terrain            terrain_flg        urban_flag
julia> nt.gt2l.land_segments.sat_flag.[TAB]

file  id    xfer
julia> nt.gt2l.land_segments.sat_flag
🔢 HDF5.Dataset: /gt2l/land_segments/sat_flag (file: ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0)
├─ 🏷️ DIMENSION_LIST
├─ 🏷️ _FillValue
├─ 🏷️ contentType
├─ 🏷️ coordinates
├─ 🏷️ description
├─ 🏷️ flag_meanings
├─ 🏷️ flag_values
├─ 🏷️ long_name
├─ 🏷️ source
├─ 🏷️ units
├─ 🏷️ valid_max
└─ 🏷️ valid_min

The big difference here is that Julia can infer the types of all "properties" because they can be inferred by the type parameters.

What I think we need is a wrapper that works much like a NamedTuple but is slightly more specialized for this purpose.

alex-s-gardner commented 1 year ago

@mkitti What an impressive 2 lines of code!

One improvement would be keep the same show for subgroups as is currently implemented:

i.e.

 h5["gt2l"]["land_segments"]["terrain"]
📂 HDF5.Group: /gt2l/land_segments/terrain (file: data/ATL08_20201121151145_08920913_006_01.h5)
├─ 🏷️ Description
├─ 🏷️ data_rate
├─ 🔢 h_te_best_fit
│  ├─ 🏷️ DIMENSION_LIST
│  ├─ 🏷️ _FillValue
│  ├─ 🏷️ contentType
│  ├─ 🏷️ coordinates
│  ├─ 🏷️ description
│  ├─ 🏷️ long_name
│  ├─ 🏷️ source
│  └─ 🏷️ units
├─ 🔢 h_te_best_fit_20m
│  ├─ 🏷️ DIMENSION_LIST
...

instead of:
```julia
nt.gt2l.land_segments.terrain
(h_te_best_fit = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_best_fit (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_best_fit_20m = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_best_fit_20m (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_interp = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_interp (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_max = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_max (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_mean = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_mean (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_median = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_median (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_min = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_min (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_mode = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_mode (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_rh25 = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_rh25 (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_skew = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_skew (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_std = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_std (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), h_te_uncertainty = HDF5.Dataset: /gt2l/land_segments/terrain/h_te_uncertainty (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), n_te_photons = HDF5.Dataset: /gt2l/land_segments/terrain/n_te_photons (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), photon_rate_te = HDF5.Dataset: /gt2l/land_segments/terrain/photon_rate_te (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), subset_te_flag = HDF5.Dataset: /gt2l/land_segments/terrain/subset_te_flag (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0), terrain_slope = HDF5.Dataset: /gt2l/land_segments/terrain/terrain_slope (file: data/ATL08_20201121151145_08920913_006_01.h5 xfer_mode: 0))
mkitti commented 1 year ago

I worked on a version but was having trouble with it in Julia 1.9. However, Julia 1.10, now in beta offers improved support.

https://discourse.julialang.org/t/how-does-nested-tab-completion-work-in-the-repl/102953/4?u=mkitti

I am now targeting this feature for Julia 1.10. There the dictionary key autocomplete also seems more complete.

I think HDF5.jl will likely continue to mainly support a dictionary-like interface `. The property-based interface here sounds like a useful accessory package that some may appreciate.

mkitti commented 11 months ago

Related: https://github.com/tshort/Eyeball.jl/issues/25