adamreeve / npTDMS

NumPy based Python module for reading TDMS files produced by LabView
http://nptdms.readthedocs.io
GNU Lesser General Public License v3.0
237 stars 88 forks source link

datetime64 shift by 2h from original time #328

Closed eryks1994 closed 6 months ago

eryks1994 commented 7 months ago

Hello, I noticed that when I read the channel with the datatype Time using script:

input_file = TdmsFile.open(input_file_path) channel_value_stamp = input_file[parameter_name][VALUE_STAMP]

The first read data seen from "nptdms" perspective is 2023-07-05T04:20:36.810135 image I compared this to the Diadem display and the data is 07/05/2023 06:20:36.8101 image

Do You have an idea what would be the reason for that 2-hour shift?

TDMS file that I try to read is created by LabView environment. Maybe that would be some hint.

Thanks for your answer, Best regards, Eryk Stebelski

adamreeve commented 7 months ago

Hi Eryk. npTDMS reads timestamps in UTC timezone, which is how they are stored internally in the TDMS format. I guess your local timezone must be 2 hours ahead of UTC, and Diadem is displaying them in local time.

eryks1994 commented 7 months ago

Ok, Thanks.

Is there chance to read time zone shift from file to accommodate this? I can think that there will be an issue when data would be generated in different time zones.

Best regards, Eryk Stebelski

śr., 13 mar 2024, 9:38 PM użytkownik Adam Reeve @.***> napisał:

Hi Eryk. npTDMS reads timestamps in UTC timezone, which is how they are stored internally in the TDMS format. I guess your local timezone must be 2 hours ahead of UTC, and Diadem is displaying them in local time.

— Reply to this email directly, view it on GitHub https://github.com/adamreeve/npTDMS/issues/328#issuecomment-1995723934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKOQQXWX7FROGIUTNTHGCDDYYC2ODAVCNFSM6AAAAABET2FGAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJVG4ZDGOJTGQ . You are receiving this because you authored the thread.Message ID: @.***>

adamreeve commented 7 months ago

The time zone shift isn't written in the file, it will depend on the time zone set on the machine running the code. Using UTC times should actually be more consistent if you're dealing with data generated in different timezones, as same time will have the same UTC timestamp regardless of the local timezone, so I'm not sure why you think this could cause issues?

There's an example in the docs of converting a UTC time to local time. It's a bit old so possibly numpy has made this easier since it was written. See the bottom of the timestamps section here: https://nptdms.readthedocs.io/en/latest/reading.html#timestamps

eryks1994 commented 6 months ago

Issue is when I have to compare data with other sources. Data saved in tdms is one of 3 data sources. Other are in CSV that save date and time adjusted for timezone. They can be generated on different devices or computers. After test there is need to compare this data and place it in one chart for comparison and crosschecking.

In your example for date time there is property("wf_start_time") channel in tdms file that unfortunately does not exist in my file. Probably there is no consistency in standard for this property in Lab View.

To accommodate this issue from my side I will push to add to logging tool a function to log time zone property to file at start of recording.

czw., 14 mar 2024, 9:47 AM użytkownik Adam Reeve @.***> napisał:

The time zone shift isn't written in the file, it will depend on the time zone set on the machine running the code. Using UTC times should actually be more consistent if you're dealing with data generated in different timezones, as same time will have the same UTC timestamp regardless of the local timezone, so I'm not sure why you think this could cause issues?

There's an example in the docs of converting a UTC time to local time. It's a bit old so possibly numpy has made this easier since it was written. See the bottom of the timestamps section here: https://nptdms.readthedocs.io/en/latest/reading.html#timestamps

— Reply to this email directly, view it on GitHub https://github.com/adamreeve/npTDMS/issues/328#issuecomment-1996886366, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKOQQXXGHJ7OBBXYX3OG2N3YYFPY7AVCNFSM6AAAAABET2FGAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWHA4DMMZWGY . You are receiving this because you authored the thread.Message ID: @.***>

adamreeve commented 6 months ago

To accommodate this issue from my side I will push to add to logging tool a function to log time zone property to file at start of recording.

That seems like the best approach if you need to know the local time that the data was generated in, as this isn't stored in the TDMS file itself. I don't think there's anything further to be done on this issue so I'm going to close it as answered.

matiasandina commented 3 weeks ago

Not sure if this is the best place to come back to this conversation but....

>>> tdms_file.as_dataframe()
               /'System'/'Time'  ...                /'Event'/'Message'
0    2024-09-05 22:09:14.097116  ...  Normal Start for Data Collection
1    2024-09-05 22:09:43.295263  ...   Normal Stop for Data Collection

These times are of type datetime64[ns] as documented.

>>> tdms_file.as_dataframe()["/'System'/'Time'"].values
array(['2024-09-05T22:09:14.097116000', '2024-09-05T22:09:43.295263000',
       '2024-09-05T22:10:13.295053000', ...,
       '2024-09-06T10:45:13.598565000', '2024-09-06T10:45:43.608856000',
       '2024-09-06T10:46:13.615856000'], dtype='datetime64[ns]')

Right now, the information is not there, so it's not zone aware. I believe it is useful to have it aware of being UTC. As a user I can ask the tzinfo. I don't have to go back and forth with my notes about when something happened, or why a plot looks wrong (aka shifted). The default gives something that can't even be aware.

>>> tdms_file.as_dataframe(arrow_dtypes=False)["/'System'/'Time'"].values[0].tzinfo is None
AttributeError: 'numpy.datetime64' object has no attribute 'tzinfo'
>>> tdms_file.as_dataframe(arrow_dtypes=True)["/'System'/'Time'"].values[0].tzinfo is None
True

The arrow version is still not aware. My question was whether it makes sense to specifically coerce these into UTC for the dataframe purposes, or add a utc_dt column. For example, I think even a simple solution like coercing the column within pandas can go a long way to avoid confusion.

>>> df = tdms_file.as_dataframe()["/'System'/'Time'"]
>>> pd.to_datetime(df).dt.tz_localize('UTC')
0      2024-09-05 22:09:14.097116+00:00
1      2024-09-05 22:09:43.295263+00:00
2      2024-09-05 22:10:13.295053+00:00
3      2024-09-05 22:10:43.294784+00:00
4      2024-09-05 22:11:13.294940+00:00
                     ...               
1510   2024-09-06 10:44:13.574567+00:00
1511   2024-09-06 10:44:43.587234+00:00
1512   2024-09-06 10:45:13.598565+00:00
1513   2024-09-06 10:45:43.608856+00:00
1514   2024-09-06 10:46:13.615856+00:00
Name: /'System'/'Time', Length: 1515, dtype: datetime64[ns, UTC]

Could this be considered as an addition?

Just to illustrate, this is the behavior I would like to make somewhat explicit to the user

from nptdms import TdmsFile
import pandas as pd

def read_and_convert_tdms(filepath):
  # Read the TDMS file into a DataFrame
  tdms_file = TdmsFile.read(filepath)
  df = tdms_file.as_dataframe()
# Convert the 'System Time' column to a timezone-aware datetime in UTC
# Adjust the column name as it appears in your TDMS file structure
  df["/'System'/'Time'"] = pd.to_datetime(df["/'System'/'Time'"]).dt.tz_localize('UTC')
  return df

Is it possible that data stored as UTC are actually read / displayed as local time when calling as_dataframe()?