European-XFEL / EXtra-data

Access saved EuXFEL data
https://extra-data.rtfd.io
BSD 3-Clause "New" or "Revised" License
7 stars 13 forks source link

Return train timestamps in local (i.e., EuXFEL) time not just UTC #538

Closed fadybishara closed 1 month ago

fadybishara commented 3 months ago

Added a non-positional boolean argument local_time=False/True that returns the time in the Europe/Berlin timezone.

takluyver commented 3 months ago

Thanks!

I'm not super keen on using pandas to do a timezone conversion when we're returning a numpy type. The point of deferring the import is that you only need to pay that cost if you want to work with pandas objects. Hardcoding German time as 'local time' could also be confusing if someone copies EuXFEL data to another location (not unheard of) and wants to use EXtra-data on it.

Instead, I think that when we're returning a pandas series (labelled=True), we should return it with timezone-aware datetimes in UTC, and then the caller can choose to convert them by doing:

run.train_timestamps(labelled=True).dt.tz_convert('Europe/Berlin')

That way, we keep the simple rule that the method always gives UTC timestamps, but make it explicit where possible so it's harder to make mistakes with them. I'd do the same with the numpy arrays, but numpy's datetime support doesn't know about timezones.

It might also be worth offering another option to return a list of Python datetime objects, again with timezone info attached. But we can always do that separately.

philsmt commented 3 months ago

The idea of a boolean argument returning "EuXFEL-time" is on me - the idea was that since all data is still taken at European XFEL (even if copied otherwise), I would expect essentially everyone to either care for UTC or for local time when data was acquired. It seems far-fetched to consider any other time would be relevant, i.e. someone creating EXDF data independently of data acquired at EuXFEL.

fadybishara commented 3 months ago

The "use-case" for me arose when I was trying to query the DOOCS DAQ via pydaq -- and I needed to do this in specific time windows corresponding to certain runs. So for me, it was also the case that I needed EuXFEL local time.

I agree with @takluyver that naming the option local_time could be confusing -- I can name it, e.g., exfel_local_time or exfel_time

Also, regarding Thomas' comment:

I'm not super keen on using pandas to do a timezone conversion

I didn't find a simpler way to do the conversion -- this was by far the simplest thanks to the tz_convert and tz_localize functions. The latter was very useful because I wanted to go back to the datetime64 type; but okay, I can also return a datetime object (which is what pandas returns, by the way)

takluyver commented 1 month ago

Done in #550, thanks!