INTERMAGNET / wg-www-gins-data-formats

Repository to track working group discussions for WWW/Gins/Data Formats
2 stars 1 forks source link

CDF leap second correction #5

Open CharlesBlais opened 4 years ago

CharlesBlais commented 4 years ago

Stephan Bracke identified a problem with leap second in the format. After discussion, it was identified that the problem is with the leap second table text file and requires an update to the NASA CDF by recompiling the library. If all can contribute more on the details on problem and how to resolve.

stephanbracke commented 4 years ago

Introduction

On the CDF page you find the basic description of what CDF is able to do. One of his advantages is to work with Leap Seconds. For this it has introduced a datetime type called CDF_TIME_TT2000. Internally timestamps are stored as ns from the 2000 January 1, 12h Terrestrial Time (TT). TT can be converterd to UTC by

deltaAT is the sum of the leap seconds since 1960. To be able to calculate this timestamp the used software needs to be aware of all the leap seconds upon the timestamp you want to create.For example creating the date 2019/01/01 the software needs to be aware of all leap seconds before 2019/01/01. To be able to check how the software is doing that I looked closer to two API :

Both work with a table that you can find here https://cdf.gsfc.nasa.gov/html/CDFLeapSeconds.txt. It is basically a text file containing all leap seconds up till now. Naturally this file will change everytime that a new leap second is introduced and that is exactly something that has to be handled with care because creating files with an out of date CDFLeapSeconds.txt cause timestamps to be created with some seconds shifted.

API's and there behaviour

Currently I tested the C code version V3.6.1 and V3.7.1 I also tested the use of the python Library CDFLib.

C Code

In the CDF package the CDFLeapSeconds.txt is hardcoded

When you want to externalise ( preferred way to have control of the file ) you can however

After changing everything to version 3.7.1 ( currently the most recent one). The error message disappeared and you can read the file however wrongly because all timestamps are shifted with one second.

Record # 86398: 2018-06-23T23:59:56.000000000
Record # 86399: 2018-06-23T23:59:57.000000000
Record # 86400: 2018-06-23T23:59:58.000000000

instead of correct reading 

Record # 86398: 2018-06-23T23:59:57.000000000
Record # 86399: 2018-06-23T23:59:58.000000000
Record # 86400: 2018-06-23T23:59:59.000000000

The behaviour which you see with 3.7.1 is the same if you use the python lightweight library CDFLib 0.3.15. ( which uses an externalised CDFLeapSeconds.txt). No warning but wrong readings of essentially a file that was wrongly created.

So with this tests we can conclude that :

Action points (to be verified)

Future action points (to be verified)

CDFLib lightweight python library.

In the Lightweight library there is a method that can be used to find out version of CDFleapSeconds.txt used during creation of the file. There is a method cdf_info() that eturns a dictionary that shows the basic CDF information. This information includes

file_name = "her_20190415_000000_pt1s_1.cdf"
cdf_file = cdflib.CDF('./files/'+file_name)
print("LeapSecondsTable on file creation : "+ str(cdf_file.cdf_info()['LeapSecondUpdated']))
# second method prints out theleapsecondfile you are using in the API to interprete the file
print(cdflib.cdfepoch.getLeapSecondLastUpdated())
....
LeapSecondsTable on file creation : 20150701
Leap second last updated: 2017-1-1

These to methods can be used to correct warnings and detect errors with leap seconds.

CharlesBlais commented 3 years ago

Hi @stephanbracke and @SimonFlower, is the correction of CDF leap second still a matter that needs attention?

SimonFlower commented 3 years ago

I haven't looked at this for a while, but I think it's down to me to update the CDF code on our GIN to access the correct leap second table. Once I'd done this, we'd need to update all the CDF files in the archive at NRCan, so I'm thinking it may make sense not to do this work until the archive has moved to BGS. That would make considerably less work (not having to re-transfer the CDF data ).

leonro commented 3 years ago

In software we could build in checks to verify if we have the latest one.

MagPy includes a check on actuality of the leap second table since version 0.9.3 (Python3 version making use of CDFlib) according to Stephans suggestions.

CharlesBlais commented 3 years ago

Would everyone say this issue is resolved?

stephanbracke commented 3 years ago

In magpy it is integrated and solved, but for the moment I think it is still wrongly created on the ftp site ( I checked for realtime data). But as Simon already stated in previous post he will do this when everything is moved to BGS.

SimonFlower commented 3 years ago

I have not fixed this issue in the Edinburgh GIN software (my apologies), which is what creates a large amount of data on the NRCan website. So can we keep the issue open please.

SimonFlower commented 5 months ago

I have updated the Edinburgh GIN with a new CDF library that I think fixes the problem. Since all CDF files distributed to users through the web site are generated "on the fly" in response to user requests and are not stored or cached, I think this resolves the problem for the Intermagnet web site.

The Edinbugh GIN's ftp server also generates CDF files "on-the-fly". I've checked the software that creates these CDF files (a different piece of software to the web site) and again the problem is fixed. There is a problem with the ftp server downloading large files, which is a separate issue.

So I think this issue could be closed now.