Belgingur / WrfUtils

Utilities for working with wrfout files.
Other
3 stars 0 forks source link

Elevator.py: Strange values of PB noticed in the lowest few hundred meters #25

Closed olafurrognvaldsson closed 4 years ago

olafurrognvaldsson commented 4 years ago

When the Elevator.py code reads in a "new" WRF file from the RAV2 dataset (i.e. only the latest year, 2018-19) it is unable to handle the highest values (i.e. close to 1000 hPa) of the PB (Base Pressure) variable. An example of input file is here sleggjan:/home/or/data/wrfout_d02_2018-08-28_00_test.nc4 and an example of an output file is here sleggjan:/home/or/data/wrfout_d02_2018-08-28_00_test_40-200magl.nc4 The difference can also be seen on the screenshot below. Left is input file and right is output file

Screen Shot 2020-01-23 at 23 41 54

logi commented 4 years ago

@olafurrognvaldsson can you try this again with the latest code and if it still fails (which is likely), run with --verbose and attach the console output please.

olafurrognvaldsson commented 4 years ago

@logi yebb, it is still strange. Pressure level values are negative over land but positive (yet way too low) over the ocean. Temperature values (which rely on pressure data) are also strange (not shown) Screen Shot 2020-01-24 at 11 23 53 Also, please find the log file attached elevator_2018-19.log.txt

logi commented 4 years ago

From discussion at https://github.com/Belgingur/WrfUtils/issues/24#issuecomment-578128494 the values are correct except that they are shifted by 2^17 = 2*65536 = scale_factor * offset.

logi commented 4 years ago

The title of this issue is "strange values of PB in the lowest few hundred meters". Is that still a problem? I don't see it. I see a different problem which is an incorrect pressure variable at all levels.

olafurrognvaldsson commented 4 years ago

Which is a direct consequence of incorrect values of PB (pressure = P + PB)

logi commented 4 years ago

Do you see incorrect values for PB in ncview?

olafurrognvaldsson commented 4 years ago

And yes, we still have very much wrong values of PB in the latest batch of RÁV2 data (i.e. the 2018-19 data set) Screen Shot 2020-01-24 at 13 55 59

olafurrognvaldsson commented 4 years ago

Try and run Elevator.py on this test file here sleggjan:/home/or/data/wrfout_d02_2018-08-28_00_test.nc4

logi commented 4 years ago

mynd

olafurrognvaldsson commented 4 years ago

Wrong file, this is from the earlier data set from RÁV2

logi commented 4 years ago

OK, downloading. It's helpful to press the M X1 button so it becomes M X2 and the file name fits in the window title.

olafurrognvaldsson commented 4 years ago

2018-08-21 (the last file from the 2017-18 simulation), not 2018-08-28 (which is the first file from the 2018-19 simulation).

In retrospect I should not have chosen test files that are so close in time (and hence name)

logi commented 4 years ago

And does adding 2**17 to PB make it look about right, same as with the pressure variable?

olafurrognvaldsson commented 4 years ago

No, this looks different. There is a very sharp discontinuity in the values. Over ocean we have values close to 34.000 Pa, but go one point inland and you see very large negative values (-37.000 Pa)

logi commented 4 years ago

Both these issues have to be related to the file encoding since if pressure doesn't have the discontinuities in the output that PB has, and since we can apparently use PB+P for correctly interpolating other variables, then the values must have been read correctly and exist correctly in memory in Elevator. Then they get written in a way that can't be correctly read back.

olafurrognvaldsson commented 4 years ago

You see this discontinuity if pressure as well for the 2018-19 dataset. In this dataset variables PB and P are defined as:

    short PB(Time, bottom_top, south_north, west_east) ;
        PB:description = "BASE STATE PRESSURE" ;
        PB:least_significant_digit = 0 ;
        PB:scale_factor = 2 ;
        PB:add_offset = 65536 ;

and

short P(Time, bottom_top, south_north, west_east) ;
        P:description = "perturbation pressure" ;
        P:least_significant_digit = 1 ;
        P:scale_factor = 0.2 ;

(no offset defined for P). But for files from the older part of the dataset we have

uint PB(Time, bottom_top, south_north, west_east) ;
        string PB:description = "BASE STATE PRESSURE" ;
        PB:least_significant_digit = 2LL ;
        PB:scale_factor = 0.01 ;

and

    int P(Time, bottom_top, south_north, west_east) ;
        string P:description = "perturbation pressure" ;
        P:least_significant_digit = 2LL ;
        P:scale_factor = 0.01 ;

i.e. no offset defined for either P nor PB (not sure if that matters ...)

logi commented 4 years ago

OK, at least part of error was extra stupid. I wrote the adjusted add_offset attribute to the file as offset and not add_offset so it was completely ignored. Check again?

olafurrognvaldsson commented 4 years ago

Science project goes boink as Kalvin might have said

elevator_2018-19_b.log.txt

logi commented 4 years ago

OK, another stupid error. For now, run it without --verbose

olafurrognvaldsson commented 4 years ago

Shall I do git pull first?

logi commented 4 years ago

If you do a git pull now then you can use --verbose.

olafurrognvaldsson commented 4 years ago

I ran without the verbose flag (did not do git pull ...). We still have discontinuity in the PB but we no longer have negative values over land as we did before

olafurrognvaldsson commented 4 years ago

looking strange Screen Shot 2020-01-24 at 14 50 49

logi commented 4 years ago

The PB variable is read incorrectly from the input file. The logs range of values for the lowest sigma-level of the first time step of each variable as it is read from the input file:

2020-01-24 16:38:32 |    INFO | Processing Variable
2020-01-24 16:38:33 |    INFO |     temperature
2020-01-24 16:38:42 |    INFO |         T[0,0]: -22.3 .. 10.25
2020-01-24 16:38:46 |    INFO |         P[0,0]: -1061.4 .. 609.6
2020-01-24 16:38:49 |    INFO |         PB[0,0]: 32768 .. 98302
2020-01-24 16:38:50 |    INFO |     wind_speed
2020-01-24 16:38:53 |    INFO |         U[0,0]: -27.39 .. 10.7
2020-01-24 16:38:57 |    INFO |         V[0,0]: -15.88 .. 33.14
2020-01-24 16:38:58 |    INFO | Size: 218,244 MB -> 10,382 MB, reduced to 4.8% in 26.0 s

All the values look normal except PB.

logi commented 4 years ago

Reading the file directly shows a similar result, so the problem is further upstream:

import netCDF4 as nc 
import numpy as np 

ds = nc.Dataset('wrfout_d02_2018-08-28_00_test.nc4') 
var=ds.variables['PB'] 
print(var) 

for h in range(var.shape[1]): 
    v = var[:,h] 
    print(f'{h}: {np.min(v):g} .. {np.max(v):g}') 
<class 'netCDF4._netCDF4.Variable'>
int16 PB(Time, bottom_top, south_north, west_east)
    description: BASE STATE PRESSURE
    least_significant_digit: 0
    scale_factor: 2
    add_offset: 65536
    FieldType: 104
    MemoryOrder: XYZ
    units: Pa
    stagger:
    coordinates: XLONG XLAT
unlimited dimensions: Time
current shape = (2, 36, 235, 305)
filling on, default _FillValue of -32767 used

0: 32768 .. 98302
1: 32768 .. 98302
2: 32768 .. 98302
3: 32768 .. 98302
4: 32768 .. 98302
5: 77288 .. 98258
6: 77080 .. 97988
7: 76870 .. 97718
8: 76664 .. 97452
9: 76458 .. 97186
10: 76254 .. 96922
11: 76048 .. 96656
12: 75740 .. 96258
13: 75280 .. 95668
14: 74596 .. 94784
15: 73734 .. 93670
16: 72872 .. 92558
17: 72008 .. 91446
18: 70630 .. 89666
19: 68760 .. 87256
20: 66944 .. 84912
21: 65178 .. 82634
22: 63254 .. 80152
23: 61184 .. 77482
24: 59188 .. 74904
25: 57260 .. 72418
26: 55402 .. 70022
27: 53612 .. 67712
28: 51714 .. 65264
29: 49724 .. 62696
30: 47514 .. 59846
31: 44830 .. 56384
32: 42034 .. 52776
33: 39426 .. 49412
34: 37004 .. 46286
35: 34746 .. 43374
logi commented 4 years ago

The points where PB is 32768 are very random. But 32768 is not random since that's 2^15.

import netCDF4 as nc
import numpy as np

ds = nc.Dataset('wrfout_d02_2018-08-28_00_test.nc4')
var = ds.variables['PB']
lats = ds.variables['XLAT'][0]
lons = ds.variables['XLONG'][0]

kk,jj,ii = np.where(var[0]==np.int32(32768))
for k,j,i in zip(kk,jj,ii): 
    print(f'({k:02d},{j:03d},{i:03d}) -> ({lats[j,i]:.2f}°, {lons[j,i]:.2f}°)')
(00,119,075) -> (64.99°, -22.24°)
(01,044,129) -> (63.67°, -19.90°)
(01,061,121) -> (63.97°, -20.24°)
(01,087,235) -> (64.40°, -15.51°)
(01,111,264) -> (64.80°, -14.23°)
(01,160,046) -> (65.69°, -23.60°)
(01,196,217) -> (66.38°, -16.05°)
(02,033,157) -> (63.47°, -18.77°)
(02,061,192) -> (63.97°, -17.33°)
(02,073,112) -> (64.18°, -20.62°)
(02,157,244) -> (65.65°, -14.95°)
(02,163,136) -> (65.81°, -19.67°)
(02,166,245) -> (65.81°, -14.88°)
(03,043,172) -> (63.65°, -18.16°)
(03,059,190) -> (63.93°, -17.41°)
(03,059,201) -> (63.93°, -16.96°)
(03,159,036) -> (65.66°, -24.03°)
(03,186,242) -> (66.17°, -14.96°)
(04,031,159) -> (63.44°, -18.69°)
(04,060,103) -> (63.95°, -20.97°)
(04,083,087) -> (64.35°, -21.67°)
(04,100,074) -> (64.64°, -22.24°)
(04,148,072) -> (65.50°, -22.44°)
(04,157,141) -> (65.70°, -19.45°)
(04,168,121) -> (65.90°, -20.33°)
(04,170,075) -> (65.90°, -22.36°)
(04,173,074) -> (65.95°, -22.41°)

Do we have the original file still to see what the exact value is at these coordinates in that file?

logi commented 4 years ago

Even better, can I get a 2-step version of that file to test on?

olafurrognvaldsson commented 4 years ago

Sorry, no. The raw data file has been deleted

olafurrognvaldsson commented 4 years ago

But, we have raw data from domain01 available: sleggjan:/riv/scrooge/RAV2/2018-19/wrfout_d01_2018-08-25_00:00:00

olafurrognvaldsson commented 4 years ago

But if we look at the netCDF file that Elevator.py is reading (sleggjan:/home/or/data/wrfout_d02_2018-08-28_00_test.nc4), we see that the values of PB are in the normal range Screen Shot 2020-01-24 at 20 25 54 Basically, for the lowest sigma level (as shown in the screenshot above) they range from just above 99.000 Pa (the red part over the ocean) down to 85.000 Pa (the blue areas over the icecaps)

logi commented 4 years ago

Yes. This is very strange. ncview seems to interpret the numbers correctly, but the netCDF4 python library doesn't. So the information is there, but probably something in the meta-data confuses the :imp: out of the library.

logi commented 4 years ago

Minimal python code to read and plot the bottom level of the first time-step of PB:

#!/usr/bin/env python3
from __future__ import annotations

import matplotlib.pyplot as plt
import netCDF4 as nc
import numpy as np

# Read full PB variable
ds = nc.Dataset('wrfout_d02_2018-08-28_00_test.nc4')
PB = ds.variables['PB']
print("PB:", PB)

# Read PB for t=0, k=0, flit vertically and convert to hPa
pb = PB[0, 0, -1:0:-1, :] / 100
print("pb[0,0]:", pb.shape, np.min(pb), '..', np.max(pb))

# Plot
plt.imshow(pb, interpolation='nearest')
plt.colorbar()
plt.tight_layout()
plt.show()

Result: mynd

logi commented 4 years ago

@olafurrognvaldsson does this look right for bottom sigma level? mynd

olafurrognvaldsson commented 4 years ago

@logi yes, this is sth. I would expect to see, i.e. greatest values of the base pressure over the ocean and then becoming smaller as elevation increases

logi commented 4 years ago

OK, this is a fun one. The add_offset and scale_factor are both integers in that file so the netCDF4 library does all the scaling and shifting on integer values and they overflow so higher pressures wrap around to low pressure.

I can fix it for an individual file locally with:

# Force scale_factor and add_offset to be floats to avoid overflows inside netCDF4 library
PB.scale_factor = float(PB.scale_factor)
PB.add_offset = float(PB.add_offset)
logi commented 4 years ago

I will write a utility to fix individual files. I will then force the offsets and scale factors in DropDigits output to be floats so this doesn't happen again.

logi commented 4 years ago

@olafurrognvaldsson You need to run the just added float_var_scaling.py program on all netdf files created by DropDigits recently to adjust the variable meta-data. It prints the variables that need to be changed and running it twice should do nothing.

The files are changed in-place and it's probably worth testing on a couple of files from each series before running them all.

olafurrognvaldsson commented 4 years ago

@logi nice, I'll add this to my script. I've tested it on a couple of files and the code is doing what it claims to do :)

logi commented 4 years ago

After lunch in pushing a change so it doesn't need to be done to any new files and it can just be run once for the older files and then forget about it.