barronh / pseudonetcdf

PseudoNetCDF like NetCDF except for many scientific format backends
GNU Lesser General Public License v3.0
77 stars 35 forks source link

BPCH to NetCDF Inquiry #59

Closed yankodavila closed 5 years ago

yankodavila commented 6 years ago

Hi Prof Henderson

Can Pseudo NetCDF be used to generate a BPCH file from a NetCDF one? I’ve always done the oposite so not quite sure if it will be easy or even posible to do. Thank You.

Yanko.

barronh commented 6 years ago

Yanko (and others, but especially Yanko) - it's just Barron.

There are two options. First, a simple demo. Then, I will outline how to (1) make a bpch that is like an existing file, and (2) make a new structure bpch. Finally, some caveats.

Simple Demonstration

This demonstration is instructive because you can look at the output NetCDF file as a template for what kinds of metadata are required.

step 1 : convert bpch to netcdf

import PseudoNetCDF as pnc
ncpath = 'test.nc'
bpchpath = 'test.bpch'
infile = pnc.pncopen(bpchpath, nogroup=True).copy()
infile.modelres = infile.modelres.astype('f') # glitchy if you don't do this
outfile = infile.save(ncpath, format='NETCDF3_CLASSIC')

step 2 : convert netcdf to bpch

import PseudoNetCDF as pnc
ncpath = 'test.nc'
bpchpath = 'out.bpch'
infile = pnc.pncopen(ncpath)
outfile = infile.save(bpchpath, format='bpch')

Option 1 : Structured like an existing BPCH

The easiest approach to converting a NetCDF file to bpch is to use a template bpch file. You will basically overwrite the data in the template and then save out. This allows you to borrow meta-data so that you don't have to creat it manually. It does, however, require you to get the NetCDF data into the same form as the existing BPCH file.

I'm writing these without testing them, so you'll have to check for bugs.

import PseudoNetCDF as pnc
ncpath = '/path/to/file.nc'
tmplpath = '/path/to/dummy.bpch'
outpath = 'out.bpch'
rawfile = pnc.pncopen(ncpath)
outfile = pnc.pncopen(tmplpath, format='bpch', mode='r+', noscale=True).copy()

varkeys = [k for k in outfile.variables if k.startswith('IJ-AVG-$')]

# Here you may want to interpolate or subset...
# subset - for example invert time and use just one day
# infile = rawfile.sliceDimensions(time=slice(0, 24), lay=slice(48, 0, -1))
# interpolate - assuming a layer weights array of shape (bpchlay, outlay)
# infile = rawfile.applyAlongDimensions(lay=lambda x: x[:, None] * wgts).sum(0))
# or do nothing
infile = rawfile
infile.modelres = infile.modelres.astype('f') # glitchy if you don't do this

for vark in varkeys:
    invark = vark[9:]
    invar = infile.variables[invark]
    outvar = outfile.variables[invark]
    outvar[:] = invar[:] / 1e9 # convert from ppb to v/v

outfile.save(outpath, format='bpch')

Option 2 : New Structure and/or Variables

This approach gives you control over the meta data, but with that control comes some effort. You need to explicitly declare some of the metadata and that may depend on the content of your netcdf file.

Again, this may require some trial and error.

import PseudoNetCDF as pnc
from datetime import datetime
import numpy as np

ncpath = '/path/to/file.nc'
outpath = '/path/for/output.bpch'

infile = pnc.pncopen(ncpath).copy()
# add necessary meta properties and variables
infile.ftype = 'CTM bin 4D'.ljust(40)
infile.toptitle = 'Myfile'.ljust(80)
infile.modelname = 'GEOS5_47'.ljust(20)
infile.modelres = np.array([5.,4.], dtype='f')
infile.halfpolar = 1.
infile.center180 = 1.
times = infile.getTimes()
rdate = datetime(1985, 1, 1, tzinfo=timezone.utc)
tau0 = infile.createVariable('tau0', 'f', ('time',))
tau0.units = 'hours since 1985-01-01 00:00:00+0000'
tau0[:] = [(t - rdate).total_seconds() / 3600. for t in tau0]
tau1 = infile.createVariable('tau1', 'f', ('time',))
tau1.units = 'hours since 1985-01-01 00:00:00+0000'
tau1[:-1] = tau0[1:]
tau1[-1] = tau0[-1] + 3600.
for vi, vk in enumerate(varkeys):
    var = infile.variables[vk]
    var.category = 'IJ-AVG-$'.ljust(40)
    var.tracerid = vi + 1
    var.base_units = var.units

infile.save(outpath, format='bpch')

Caveats

Just a word of caution.

You should check the outputs using the reader that will ingest them. When you make a file, it can be readable with the a custom tracerinfo.dat and diaginfo.dat. That may not be the way it is read later. The scale factors and units in the tracerinfo.dat are particularly important. A typical bpch file for GEOS-Chem hold state variables in v/v for concentrations and yet they are scaled to ppbv when read because of the tracerinfo.dat. If you write out a file in ppbv it may get scaled...

yankodavila commented 6 years ago

Hi Barron

I'm sorry I haven't been able to back to the for a while. My python skills are pretty bad. I was able to tweak your code a little and got this:

from datetime import datetime, timezone
import PseudoNetCDF as pnc
import numpy as np
import time

ncpath = 'snox_2x25.nc'
outpath = 'nox.bpch'

infile = pnc.pncopen(ncpath).copy()
# add necessary meta properties and variables
infile.ftype = 'CTM bin 4D'.ljust(40)
infile.toptitle = 'NOx'.ljust(80)
infile.modelname = 'GEOS5_47'.ljust(20)
infile.modelres = np.array([2.5,2.], dtype='f')
infile.halfpolar = 1.
infile.center180 = 1.

times = infile.getTimes()
rdate = datetime(1985, 1, 1, tzinfo=timezone.utc)

tau0 = infile.createVariable('tau0', 'f', ('time',))
tau0.units = 'hours since 1985-01-01 00:00:00+0000'
#tau0[:] = [( t - rdate ).total_seconds() / 3600. for t in tau0]
tau0[:] = [( t - time.mktime(rdate.timetuple()) ) / 3600. for t in tau0]

tau1 = infile.createVariable('tau1', 'f', ('time',))
tau1.units = 'hours since 1985-01-01 00:00:00+0000'
tau1[:-1] = tau0[1:]
tau1[-1] = tau0[-1] + 3600.

for vi, vk in enumerate(varkeys):
    var = infile.variables[vk]
    var.category = 'IJ-AVG-$'.ljust(40)
    var.tracerid = vi + 1
    var.base_units = var.units

infile.save(outpath, format='bpch')

Now I get this error

Traceback (most recent call last): File "make_bpch.py", line 32, in for vi, vk in enumerate(varkeys): NameError: name 'varkeys' is not defined

I've attached a copy of the file. Thank You for your help.

Yanko.

snox_2x25.nc.zip

barronh commented 6 years ago

In the case of your input file, you have emissions of a single species with an unknown unit without a layer dimension. My (bad) assumption was that you were working with concentration files with many species and the full tkji dimension set. Replace the last 7 lines with this:

infile.createDimension('layer1', 1)
infile.noscale = False
varkeys = ['snox']
for vi, vk in enumerate(varkeys):
    var = infile.variables[vk]
    var = var[:, None, :, :]
    var.dimensions = ('time', 'layer1', 'latitude', 'longitude')
    var.category = 'ANTHSRCE'.ljust(40)
    var.tracerid = vi + 1
    var.base_units = 'unknown'
    var.scale = 1
    infile.variables[vk] = var
    var = var[:, None, :, :]

infile.save(outpath, format='bpch')

What this won't do is write the tracerinfo.dat or diaginfo.dat files for you. The tracerid should align with the appropriate tracer and the category needs to align with the right category. Here I am assuming snox is NO and ANTHSRCE. I don't know what the base unit is so, I've set it to unknown. Obviously, you should make it correct. With a NOx species, I recommend being ultra-specific. kgNO2/s or kgNO/s or kgN/s. I often have a lot of trouble tracking down exactly what the unit really is.

It runs and passes some basic checks.

yankodavila commented 6 years ago

Hi Barron

The file is generated correctly but I can't open it with IDL. Here is the screenshot of the error as well as the IDL console stop point. Thank You.

Yanko.

% Compiled module: CREATE3DHSTRU. Error occured while reading header information ... ^C% Interrupted at: CTM_READ3DB_HEADER 400 /home/yanko/gamap2/internals/ctm_read 3db_header.pro

error

barronh commented 6 years ago

Mea culpa. In my original, I wrote:

tau0[:] = [( t - rdate ).total_seconds() / 3600. for t in tau0]

You adjusted it to:

tau0[:] = [( t - time.mktime(rdate.timetuple()) ) / 3600. for t in tau0]

That creates super negative times because tau is in hours since 1985-01-01 00Z and time.mktime is in seconds since Epoch.

What I should have written was:

tau0[:] = [( t - rdate ).total_seconds() / 3600. for t in times]

Does that fix it?

barronh commented 5 years ago

@yankodavila

Can you remind me how we resolved this? I think it was a top title issue.

yankodavila commented 5 years ago

Hi Barron

We change the type of file form

infile.ftype = 'CTM bin 4D'.ljust(40)

to

infile.ftype = 'CTM bin 02'.ljust(40)

Because I had a 2D (surface layer only) file. Sorry for keeping this open for so long. I can provide the final script as well as the files required to generate the bpch from netcdf if you’d like. Thanks for your help.

Yanko.

On 18. Jan 2019, at 17:45, barronh notifications@github.com<mailto:notifications@github.com> wrote:

@yankodavilahttps://github.com/yankodavila

Can you remind me how we resolved this? I think it was a top title issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/barronh/pseudonetcdf/issues/59#issuecomment-455611331, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACQDa2yJ--fDvPUagsUdhFs5pOCjb9bRks5vEfomgaJpZM4W0aQu.

barronh commented 5 years ago

Thanks @yankodavila