European-XFEL / EXtra-data

Access saved EuXFEL data
https://extra-data.rtfd.io
BSD 3-Clause "New" or "Revised" License
7 stars 13 forks source link

Add extra_data.KeyData.series() support for multi-dimensional data #457

Open bj-s opened 1 year ago

bj-s commented 1 year ago

extra_data.KeyData supports the pandas.Series output only for one dimensional data. For multi-dimensional data it fails with TypeError: pandas Series are only available for 1D data.

import extra_data as xd

run = xd.open_run(700000, 46).select_trains(slice(0, 50))
pd_series = (
    run[("SQS_DIAG1_XGMD/XGM/DOOCS:output", "data.intensitySa3TD")]
    .series()
)

With MultiIndex, support for multi-dimensional data could be added to produce a pandas.Series like:

import extra_data as xd

run = xd.open_run(700000, 46).select_trains(slice(0, 50))
pd_series = (
    run[("SQS_DIAG1_XGMD/XGM/DOOCS:output", "data.intensitySa3TD")]
    .xarray()
    .to_dataframe()["SQS_DIAG1_XGMD/XGM/DOOCS:output.data.intensitySa3TD"]
)
takluyver commented 1 year ago

Thanks! It looks like you can even do .xarray().to_series() to get this.

I think it might be best to hold off on a shortcut for this until we have a pulse ID that's meaningful across components, though. The second layer of the MultiIndex you're getting with .to_series() or .to_dataframe() is basically meaningless at the moment, but pandas will still happily line up dim_0 from one series with dim_0 from another and give you garbage results. If we one day have common pulse IDs that we can put in the second index layer, it would make a whole lot more sense.