Write header - Githubissues

BaptisteVandecrux commented 3 years ago

Simple header writer.

Test:

import pytest
import numpy as np
import nead

fname = "sample.csv"

ds = nead.read(fname, index_col=0, MKS=True)
df = ds.to_dataframe()

ds.attrs['station_id'] = 'test_station_processed'
df['TA_cor'] = df.TA+10
df = df.reset_index()
nead.write_header('processed_header.ini', df,
                  metadata = ds.attrs,
                  fields = df.columns,
                  units = ['time', 'K','perc', 'ms-1', 'Wm-2', 'K'])
nead.write(df, nead_header = 'processed_header.ini', output_path = 'sample_processed.csv')

ds2 = nead.read('sample_processed.csv', index_col=0, MKS=True)

should produce 'processed_header.ini':

[METADATA]
station_id = test_station_processed
latitude = 46.5
longitude = 9.8
altitude = 1500
nodata = -999
timezone = 1
field_delimiter = ,
[FIELDS]
fields = timestamp,TA,RH,VW,ISWR,TA_cor
add_value = 0,0,0,0,0,0
scale_factor = 1,1,1,1,1,1
units = time,K,perc,ms-1,Wm-2,K
display_description = timestamp,TA,RH,VW,ISWR,TA_cor
database_fields = timestamp,TA,RH,VW,ISWR,TA_cor
database_fields_data_types = timestamp,float64,float64,float64,float64,float64
[DATA]

and sample_processed.csv:

# NEAD 1.0 UTF-8
# [METADATA]
# station_id = test_station_processed
# latitude = 46.5
# longitude = 9.8
# altitude = 1500
# nodata = -999
# timezone = 1
# field_delimiter = ,
# 
# [FIELDS]
# fields = timestamp,TA,RH,VW,ISWR,TA_cor
# add_value = 0,0,0,0,0,0
# scale_factor = 1,1,1,1,1,1
# units = time,K,perc,ms-1,Wm-2,K
# display_description = timestamp,TA,RH,VW,ISWR,TA_cor
# database_fields = timestamp,TA,RH,VW,ISWR,TA_cor
# database_fields_data_types = timestamp,float64,float64,float64,float64,float64
# 
# [DATA]
# 
# 
2010-06-22 12:00:00,275.15,0.52,1.2,320.0,285.15
2010-06-22 13:00:00,276.15,0.6,2.4,340.0,286.15
2010-06-22 14:00:00,275.95,0.56,2.0,330.0,285.95

mankoff commented 3 years ago

At first glance, why is the header written to disk, when NEAD standard is that the header is in the file? If you're creating a function to generate a header from existing data in memory, I think the header should remain in memory (as a string or some easy-to-generate data structure like a dictionary). Then the writer should take that variable, as opposed to a file on disk.

If you have to use a file on disk for some reason, it should be cleaned up afterward. It doesn't look like the INI file is removed.

BaptisteVandecrux commented 3 years ago

I did that because when working on the nead.write function we agreed that:

Overall I think this is the right way to handle it - make the user define a header and then apply it. It makes the code much simpler.

But if I am simply adding few columns (like in the example) it is tedious to prepare manually the ini file and make sure it has the right number of fields and fields attributes.

The ini file was saved on disk to allow the user to make a first draft using the nead.write_header and potentially to add info manually before writing files with nead.write. When producing a new NEAD file, offering the possibility to write the header from the xarray metadata but alternatively to use an existing ini file makes sure that every cases are possible. Maybe it should be integrated to nead.write or be renamed as nead.header_drafter.

GEUS-Glaciology-and-Climate / pyNEAD

Write header #7