Closed olafurrognvaldsson closed 4 years ago
@olafurrognvaldsson can you try this again with the latest code and if it still fails (which is likely), run with --verbose
and attach the console output please.
@logi yebb, it is still strange. Pressure level values are negative over land but positive (yet way too low) over the ocean. Temperature values (which rely on pressure data) are also strange (not shown) Also, please find the log file attached elevator_2018-19.log.txt
From discussion at https://github.com/Belgingur/WrfUtils/issues/24#issuecomment-578128494 the values are correct except that they are shifted by 2^17 = 2*65536 = scale_factor * offset
.
The title of this issue is "strange values of PB in the lowest few hundred meters". Is that still a problem? I don't see it. I see a different problem which is an incorrect pressure
variable at all levels.
Which is a direct consequence of incorrect values of PB
(pressure = P + PB
)
Do you see incorrect values for PB in ncview?
And yes, we still have very much wrong values of PB
in the latest batch of RÁV2 data (i.e. the 2018-19 data set)
Try and run Elevator.py
on this test file here
sleggjan:/home/or/data/wrfout_d02_2018-08-28_00_test.nc4
Wrong file, this is from the earlier data set from RÁV2
OK, downloading. It's helpful to press the M X1
button so it becomes M X2
and the file name fits in the window title.
2018-08-21 (the last file from the 2017-18 simulation), not 2018-08-28 (which is the first file from the 2018-19 simulation).
In retrospect I should not have chosen test files that are so close in time (and hence name)
And does adding 2**17 to PB make it look about right, same as with the pressure
variable?
No, this looks different. There is a very sharp discontinuity in the values. Over ocean we have values close to 34.000 Pa, but go one point inland and you see very large negative values (-37.000 Pa)
Both these issues have to be related to the file encoding since if pressure
doesn't have the discontinuities in the output that PB
has, and since we can apparently use PB+P
for correctly interpolating other variables, then the values must have been read correctly and exist correctly in memory in Elevator
. Then they get written in a way that can't be correctly read back.
You see this discontinuity if pressure
as well for the 2018-19 dataset.
In this dataset variables PB
and P
are defined as:
short PB(Time, bottom_top, south_north, west_east) ;
PB:description = "BASE STATE PRESSURE" ;
PB:least_significant_digit = 0 ;
PB:scale_factor = 2 ;
PB:add_offset = 65536 ;
and
short P(Time, bottom_top, south_north, west_east) ;
P:description = "perturbation pressure" ;
P:least_significant_digit = 1 ;
P:scale_factor = 0.2 ;
(no offset defined for P
).
But for files from the older part of the dataset we have
uint PB(Time, bottom_top, south_north, west_east) ;
string PB:description = "BASE STATE PRESSURE" ;
PB:least_significant_digit = 2LL ;
PB:scale_factor = 0.01 ;
and
int P(Time, bottom_top, south_north, west_east) ;
string P:description = "perturbation pressure" ;
P:least_significant_digit = 2LL ;
P:scale_factor = 0.01 ;
i.e. no offset defined for either P
nor PB
(not sure if that matters ...)
OK, at least part of error was extra stupid. I wrote the adjusted add_offset
attribute to the file as offset
and not add_offset
so it was completely ignored. Check again?
Science project goes boink as Kalvin might have said
OK, another stupid error. For now, run it without --verbose
Shall I do git pull
first?
If you do a git pull
now then you can use --verbose
.
I ran without the verbose flag (did not do git pull ...
). We still have discontinuity in the PB
but we no longer have negative values over land as we did before
looking strange
The PB
variable is read incorrectly from the input file. The logs range of values for the lowest sigma-level of the first time step of each variable as it is read from the input file:
2020-01-24 16:38:32 | INFO | Processing Variable
2020-01-24 16:38:33 | INFO | temperature
2020-01-24 16:38:42 | INFO | T[0,0]: -22.3 .. 10.25
2020-01-24 16:38:46 | INFO | P[0,0]: -1061.4 .. 609.6
2020-01-24 16:38:49 | INFO | PB[0,0]: 32768 .. 98302
2020-01-24 16:38:50 | INFO | wind_speed
2020-01-24 16:38:53 | INFO | U[0,0]: -27.39 .. 10.7
2020-01-24 16:38:57 | INFO | V[0,0]: -15.88 .. 33.14
2020-01-24 16:38:58 | INFO | Size: 218,244 MB -> 10,382 MB, reduced to 4.8% in 26.0 s
All the values look normal except PB.
Reading the file directly shows a similar result, so the problem is further upstream:
import netCDF4 as nc
import numpy as np
ds = nc.Dataset('wrfout_d02_2018-08-28_00_test.nc4')
var=ds.variables['PB']
print(var)
for h in range(var.shape[1]):
v = var[:,h]
print(f'{h}: {np.min(v):g} .. {np.max(v):g}')
<class 'netCDF4._netCDF4.Variable'>
int16 PB(Time, bottom_top, south_north, west_east)
description: BASE STATE PRESSURE
least_significant_digit: 0
scale_factor: 2
add_offset: 65536
FieldType: 104
MemoryOrder: XYZ
units: Pa
stagger:
coordinates: XLONG XLAT
unlimited dimensions: Time
current shape = (2, 36, 235, 305)
filling on, default _FillValue of -32767 used
0: 32768 .. 98302
1: 32768 .. 98302
2: 32768 .. 98302
3: 32768 .. 98302
4: 32768 .. 98302
5: 77288 .. 98258
6: 77080 .. 97988
7: 76870 .. 97718
8: 76664 .. 97452
9: 76458 .. 97186
10: 76254 .. 96922
11: 76048 .. 96656
12: 75740 .. 96258
13: 75280 .. 95668
14: 74596 .. 94784
15: 73734 .. 93670
16: 72872 .. 92558
17: 72008 .. 91446
18: 70630 .. 89666
19: 68760 .. 87256
20: 66944 .. 84912
21: 65178 .. 82634
22: 63254 .. 80152
23: 61184 .. 77482
24: 59188 .. 74904
25: 57260 .. 72418
26: 55402 .. 70022
27: 53612 .. 67712
28: 51714 .. 65264
29: 49724 .. 62696
30: 47514 .. 59846
31: 44830 .. 56384
32: 42034 .. 52776
33: 39426 .. 49412
34: 37004 .. 46286
35: 34746 .. 43374
The points where PB is 32768 are very random. But 32768 is not random since that's 2^15.
import netCDF4 as nc
import numpy as np
ds = nc.Dataset('wrfout_d02_2018-08-28_00_test.nc4')
var = ds.variables['PB']
lats = ds.variables['XLAT'][0]
lons = ds.variables['XLONG'][0]
kk,jj,ii = np.where(var[0]==np.int32(32768))
for k,j,i in zip(kk,jj,ii):
print(f'({k:02d},{j:03d},{i:03d}) -> ({lats[j,i]:.2f}°, {lons[j,i]:.2f}°)')
(00,119,075) -> (64.99°, -22.24°)
(01,044,129) -> (63.67°, -19.90°)
(01,061,121) -> (63.97°, -20.24°)
(01,087,235) -> (64.40°, -15.51°)
(01,111,264) -> (64.80°, -14.23°)
(01,160,046) -> (65.69°, -23.60°)
(01,196,217) -> (66.38°, -16.05°)
(02,033,157) -> (63.47°, -18.77°)
(02,061,192) -> (63.97°, -17.33°)
(02,073,112) -> (64.18°, -20.62°)
(02,157,244) -> (65.65°, -14.95°)
(02,163,136) -> (65.81°, -19.67°)
(02,166,245) -> (65.81°, -14.88°)
(03,043,172) -> (63.65°, -18.16°)
(03,059,190) -> (63.93°, -17.41°)
(03,059,201) -> (63.93°, -16.96°)
(03,159,036) -> (65.66°, -24.03°)
(03,186,242) -> (66.17°, -14.96°)
(04,031,159) -> (63.44°, -18.69°)
(04,060,103) -> (63.95°, -20.97°)
(04,083,087) -> (64.35°, -21.67°)
(04,100,074) -> (64.64°, -22.24°)
(04,148,072) -> (65.50°, -22.44°)
(04,157,141) -> (65.70°, -19.45°)
(04,168,121) -> (65.90°, -20.33°)
(04,170,075) -> (65.90°, -22.36°)
(04,173,074) -> (65.95°, -22.41°)
Do we have the original file still to see what the exact value is at these coordinates in that file?
Even better, can I get a 2-step version of that file to test on?
Sorry, no. The raw data file has been deleted
But, we have raw data from domain01 available:
sleggjan:/riv/scrooge/RAV2/2018-19/wrfout_d01_2018-08-25_00:00:00
But if we look at the netCDF file that Elevator.py is reading (sleggjan:/home/or/data/wrfout_d02_2018-08-28_00_test.nc4
), we see that the values of PB
are in the normal range
Basically, for the lowest sigma level (as shown in the screenshot above) they range from just above 99.000 Pa (the red part over the ocean) down to 85.000 Pa (the blue areas over the icecaps)
Yes. This is very strange. ncview
seems to interpret the numbers correctly, but the netCDF4
python library doesn't. So the information is there, but probably something in the meta-data confuses the :imp: out of the library.
Minimal python code to read and plot the bottom level of the first time-step of PB:
#!/usr/bin/env python3
from __future__ import annotations
import matplotlib.pyplot as plt
import netCDF4 as nc
import numpy as np
# Read full PB variable
ds = nc.Dataset('wrfout_d02_2018-08-28_00_test.nc4')
PB = ds.variables['PB']
print("PB:", PB)
# Read PB for t=0, k=0, flit vertically and convert to hPa
pb = PB[0, 0, -1:0:-1, :] / 100
print("pb[0,0]:", pb.shape, np.min(pb), '..', np.max(pb))
# Plot
plt.imshow(pb, interpolation='nearest')
plt.colorbar()
plt.tight_layout()
plt.show()
Result:
@olafurrognvaldsson does this look right for bottom sigma level?
@logi yes, this is sth. I would expect to see, i.e. greatest values of the base pressure over the ocean and then becoming smaller as elevation increases
OK, this is a fun one. The add_offset
and scale_factor
are both integers in that file so the netCDF4 library does all the scaling and shifting on integer values and they overflow so higher pressures wrap around to low pressure.
I can fix it for an individual file locally with:
# Force scale_factor and add_offset to be floats to avoid overflows inside netCDF4 library
PB.scale_factor = float(PB.scale_factor)
PB.add_offset = float(PB.add_offset)
I will write a utility to fix individual files. I will then force the offsets and scale factors in DropDigits output to be floats so this doesn't happen again.
@olafurrognvaldsson You need to run the just added float_var_scaling.py
program on all netdf files created by DropDigits recently to adjust the variable meta-data. It prints the variables that need to be changed and running it twice should do nothing.
The files are changed in-place and it's probably worth testing on a couple of files from each series before running them all.
@logi nice, I'll add this to my script. I've tested it on a couple of files and the code is doing what it claims to do :)
After lunch in pushing a change so it doesn't need to be done to any new files and it can just be run once for the older files and then forget about it.
When the Elevator.py code reads in a "new" WRF file from the RAV2 dataset (i.e. only the latest year, 2018-19) it is unable to handle the highest values (i.e. close to 1000 hPa) of the PB (Base Pressure) variable. An example of input file is here
sleggjan:/home/or/data/wrfout_d02_2018-08-28_00_test.nc4
and an example of an output file is heresleggjan:/home/or/data/wrfout_d02_2018-08-28_00_test_40-200magl.nc4
The difference can also be seen on the screenshot below. Left isinput
file and right isoutput
file