MAVENSDC / cdflib

A python module for reading NASA's Common Data Format (cdf) files
MIT License
82 stars 45 forks source link

cdfepoch.to_datetime incorrect when multiple NaT values in TT2000 array #233

Open rjolitz opened 11 months ago

rjolitz commented 11 months ago

I was attempting to read MAVEN SEP level 2 datasets, which can have multiple NaNs in the time axis. After successfully pulling the TT2000 and Unix time fields from the CDF, I tried to convert the TT2000 into datetime via cdfepoch.to_datetime. However, it appears to floor the values by hour in the rest of the converted time array when there are multiple NaT values (which appear as -9223372036854775807) in the array. When I filter out the NaT values before converting via cdfepoch.to_datetime, the function works properly. Here is a figure I made of the Unix time versus the cdfepoch.to_datetime result, on the raw TT2000 and the filtered TT2000:

CDF_epoch_nan_issues

And the code:

raw_filename = "/Users/rjolitz/Downloads/test_maven_data/mvn_sep_l2_s1-raw-svy-full_20220801_v04_r03.cdf"
data_cdf = cdflib.CDF(raw_filename)
data_i = data_cdf.varget("epoch")
time_unx = data_cdf.varget("time_unix")

unfiltered = cdflib.cdfepoch.to_datetime(data_i)
filtered_time_unx = time_unx[np.where(data_i > 0)[0]]
filtered = cdflib.cdfepoch.to_datetime(data_i[np.where(data_i > 0)[0]])

plt.figure()
plt.title(os.path.split(raw_filename)[-1])
plt.plot(time_unx, unfiltered, label='CDF_EPOCH = cdf.varget("epoch")')
plt.plot(filtered_time_unx, filtered, label='CDF_EPOCH = cdf.varget("epoch")[epoch > 0]')
plt.xlabel("Unix time (elapsed s since 1970-01-01/00:00 w/o leap seconds")
plt.ylabel("cdflib.cdfepoch.to_datetime(CDF_EPOCH[TT2000])")
plt.legend()
plt.gca().yaxis.set_major_formatter(mdates.DateFormatter('%b-%d %H:%M'))
plt.show()

It's really strange that the presence of multiple fillvals in the TT2000 can throw off the entire array. The file I'm reading is mvn_sep_l2_s1-cal-svy-full_20220801_v04_r03.cdf, which is accessible here.