google-deepmind / graphcast

Apache License 2.0
4.36k stars 537 forks source link

[GraphCast Operational Model] Issue with Negative Precipitation Data in GraphCast Operational Model Output #57

Closed chengshenlian closed 5 months ago

chengshenlian commented 5 months ago

Background: I've been using the GraphCast Operational model to generate 6-hour cumulative precipitation forecast data and have attempted to visualize it with NASA's Panoply software. Despite the model's ability to predict precipitation without using rainfall as an input, as described by the authors in a Science article, I couldn't find a parameter explicitly labeled as precipitation amount in Panoply. Instead, I came across a parameter named "Mixed intervals Accumulation," which raises my suspicion that Panoply might not be compatible with displaying this type of data.

Questions: By programming my way through the GRIB files, I managed to access the data marked as "Total precipitation." However, I've noticed that some of the data includes negative values, which seems counterintuitive for precipitation metrics. As I am not a professional meteorologist, I would like to understand the following:

Is it normal to encounter negative values in precipitation data, or could this indicate an error or some other issue? If negative values are normal, what do they signify? Should I apply any special treatment to these negative values when analyzing precipitation data?

Additional Information: The code snippet I used is as follows:

import pygrib  # Library for reading GRIB format data
import datetime  # Library for handling dates and times
import numpy as np  # Library for mathematical operations
import matplotlib.pyplot as plt  # Library for plotting
import cartopy.crs as ccrs  # Library for map projections
import cartopy.feature as cfeature  # Library for map features
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER  # Formatting for map gridline labels

# Open the GRIB2 file
file_path = 'graphcast.grib'  # Replace with your GRIB2 file path
grbs = pygrib.open(file_path)

# Set the target date and time
target_date = datetime.datetime(2023, 12, 30, 18, 0)  # Example date and time
# Find data matching the specific variable and date
for grb in grbs:
    if grb.name == "Total precipitation" and grb.validDate == target_date:
        data = grb.values  # Read the data
        lats, lons = grb.latlons()  # Get the latitudes and longitudes corresponding to the data
        break
grbs.close()

# Print out statistical information about the data
print(f'Minimum value: {data.min()}')  # Minimum value
print(f'Maximum value: {data.max()}')  # Maximum value
print(f'Mean value: {data.mean()}')  # Mean value
print(f'Median value: {np.median(data)}')  # Median value
print(f'Standard deviation: {data.std()}')  # Standard deviation

# Create the map
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.PlateCarree())
# Create color levels for the contour plot
levels = np.linspace(-0.0002, 0.12, 10)  # Create levels from a bit below the minimum to above the maximum value

# Plot the contour map using the color levels
precipitation = ax.contourf(lons, lats, data, levels=levels, transform=ccrs.PlateCarree(), cmap='viridis')

# Add map features like coastlines and borders
ax.coastlines()
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Add gridlines and labels for longitude and latitude
gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True)
gl.top_labels = False  # Disable top labels
gl.right_labels = False  # Disable right labels
gl.xformatter = LONGITUDE_FORMATTER  # Format for longitude
gl.yformatter = LATITUDE_FORMATTER  # Format for latitude

# Add a colorbar to explain the color encoding of precipitation levels
plt.colorbar(precipitation, ax=ax, orientation='horizontal', pad=0.05, aspect=50, label='Precipitation (m/6hr)', ticks=levels)
plt.title(f'6-Hour Accumulated Precipitation Forecast (Ending on {target_date.strftime("%Y-%m-%d %H:%M")})')

plt.show()  # Display the map
# python3.10 requirements.txt
Cartopy==0.22.0
certifi==2023.11.17
contourpy==1.2.0
cycler==0.12.1
fonttools==4.47.0
kiwisolver==1.4.5
matplotlib==3.8.2
mplcursors==0.5.2
numpy==1.26.2
packaging==23.2
pandas==2.1.4
Pillow==10.1.0
pygrib==2.1.5
pyparsing==3.1.1
pyproj==3.6.1
pyshp==2.3.1
python-dateutil==2.8.2
pytz==2023.3.post1
shapely==2.0.2
six==1.16.0
tzdata==2023.3
xarray==2023.12.0

A Python Code visualization screenshot of the precipitation data: image A Panoply visualization screenshot of the precipitation data: image image

My data file can be downloaded from the following Google Drive link:

I am grateful for any explanation and advice. Attachments: Download link for Panoply Software: https://www.giss.nasa.gov/tools/panoply/download/ Download link for the grib(6.5GB) data file : https://drive.google.com/file/d/1JrsCXZcRBXgEQg-Xu0rd7EsvR_XLARPU/view?usp=drive_link

alvarosg commented 5 months ago

Thanks for your message, indeed the model can sometimes produce negative values for precipitation. This is not surprising for neural networks, especially when they have an output head that produces data with a big dynamic range. It probably does not have any physical meaning. A root solution for this would have been to design the network so it is impossible to output negative precipitation, but we did not do this as we did not want the network to have any privileged treatment of precipitation in our neural network.

In practice though, when we looked at this we found that:

So in practice you can treat any negative values as if they were 0.

Hope this helps!

chengshenlian commented 5 months ago

Thank you for your detailed explanation.