NREL / OpenOA

This library provides a framework for assessing wind plant performance using operational assessment (OA) methodologies that consume time series data from wind plants. The goal of the project is to provide an open source implementation of common data structures, analysis methods, and utility functions relevant to wind plant OA.
https://openoa.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
194 stars 63 forks source link

Reanalysis downloading module #172

Closed ejsimley closed 2 years ago

ejsimley commented 3 years ago

This pull request adds a module for downloading reanalysis data in the toolkits directory, called reanalysis_downloading. Currently, this module supports downloading MERRA-2 and ERA5 data using the PlanetOS API, but some documentation is provided explaining how to download data directly from NASA and Copernicus too. Documentation in the module explains how to obtain a PlanetOS API key, which is needed to use the module, as well. A method has also been added to the reanalysis data type class that allows loading data using the PlanetOS API (via the reanalysis_downloading toolkit module) in addition to loading from CSV files.

Within the reanalysis_downloading module, the main function used to download data is called "download_reanalysis_data_planetos". It returns a DataFrame containing the downloaded data and optionally saves the data as a csv file. It's arguments include:

Note: before merging this we'll ask the PlanetOS team and NREL to review the documentation as well.

Here are some examples of using the function with the default atmospheric variables and without saving a CSV file, highlighting different ways of specifying the date range:

First, downloading the default date range, which is 20 years up to the end of the most recent full month:

from operational_analysis.toolkits import reanalysis_downloading as rd

lat, lon = (48.452, 5.588)
df = rd.download_reanalysis_data_planetos("era5",lat,lon)

df
                  datetime   u_100   v_100      t_2m     surf_pres
0      2001-08-01 00:00:00 -2.5625 -4.8125  291.7500  98484.851562
1      2001-08-01 01:00:00 -2.8750 -4.5000  290.8750  98490.906250
2      2001-08-01 02:00:00 -3.0625 -4.5625  290.3750  98477.046875
3      2001-08-01 03:00:00 -3.2500 -4.6875  289.7500  98478.187500
4      2001-08-01 04:00:00 -3.5625 -4.3750  289.5625  98478.132812
...                    ...     ...     ...       ...           ...
175315 2021-07-31 19:00:00  4.1250 -0.5000  290.0000  97289.796875
175316 2021-07-31 20:00:00  3.9375 -0.9375  289.1875  97285.867188
175317 2021-07-31 21:00:00  4.6875 -1.8750  287.8125  97288.343750
175318 2021-07-31 22:00:00  5.7500  1.1250  288.8125  97284.703125
175319 2021-07-31 23:00:00  5.9375 -1.8125  288.3125  97308.468750

[175320 rows x 5 columns]

Next, we'll download data for the date range from 2000-01-01 to 2005-07-15:

df = rd.download_reanalysis_data_planetos("era5",lat,lon,st_date="2000-01-01 00:00:00",en_date="2005-07-15 00:00:00")

df
                 datetime     u_100     v_100        t_2m     surf_pres
0     2000-01-01 00:00:00  2.410797  3.399582  275.588928  98387.429688
1     2000-01-01 01:00:00  2.662781  3.404541  275.622375  98394.093750
2     2000-01-01 02:00:00  2.945435  3.149887  275.503174  98405.484375
3     2000-01-01 03:00:00  3.265747  2.691406  275.703827  98422.265625
4     2000-01-01 04:00:00  3.489532  2.327530  275.806824  98435.046875
...                   ...       ...       ...         ...           ...
48524 2005-07-14 20:00:00 -3.000000 -1.000000  294.812500  97920.125000
48525 2005-07-14 21:00:00 -4.062500 -0.187500  293.375000  97928.125000
48526 2005-07-14 22:00:00 -2.687500  4.312500  292.500000  97936.250000
48527 2005-07-14 23:00:00 -1.375000  5.125000  292.500000  97923.625000
48528 2005-07-15 00:00:00 -0.187500  5.312500  292.875000  97903.750000

[48529 rows x 5 columns]

If we only specify the start date, it will download "num_years" of data starting on that date. Note, we're asking for MERRA-2 data now:

df = rd.download_reanalysis_data_planetos("merra2",lat,lon,st_date="2000-01-01 00:00:00",num_years=10)

df

                 datetime      u_50      v_50        t_2m     surf_pres
0     2000-01-01 00:30:00  2.752302  3.408454  273.226990  98509.031250
1     2000-01-01 01:30:00  2.949233  3.312825  273.238403  98532.351562
2     2000-01-01 02:30:00  3.327359  3.315226  273.190216  98559.171875
3     2000-01-01 03:30:00  3.849879  3.222857  273.234039  98568.304688
4     2000-01-01 04:30:00  3.964457  3.219257  273.700317  98577.156250
...                   ...       ...       ...         ...           ...
87667 2009-12-31 19:30:00  5.901794  1.654513  277.443542  95376.351562
87668 2009-12-31 20:30:00  5.220856  0.378700  276.895721  95362.507812
87669 2009-12-31 21:30:00  4.422727 -0.695482  276.098022  95343.648438
87670 2009-12-31 22:30:00  3.526426 -1.546500  275.594177  95334.453125
87671 2009-12-31 23:30:00  2.701854 -1.986580  275.162231  95315.953125

[87672 rows x 5 columns]

Similarly, if we only specify the end date, it will download "num_years" of data ending on that date.

df = rd.download_reanalysis_data_planetos("merra2",lat,lon,en_date="2000-01-01 00:00:00",num_years=10)

df
                 datetime      u_50      v_50        t_2m     surf_pres
0     1990-01-01 01:30:00  0.328658  2.941333  270.924133  97988.281250
1     1990-01-01 02:30:00  0.113014  2.962437  270.800140  97969.343750
2     1990-01-01 03:30:00 -0.067724  2.966995  270.768768  97947.398438
3     1990-01-01 04:30:00 -0.281913  2.952448  270.647156  97941.921875
4     1990-01-01 05:30:00 -0.480738  2.961390  270.590546  97939.265625
...                   ...       ...       ...         ...           ...
87643 1999-12-31 20:30:00  2.025297  3.811939  272.434448  98413.125000
87644 1999-12-31 21:30:00  2.199683  3.815449  272.300385  98435.867188
87645 1999-12-31 22:30:00  2.294813  3.794298  272.419586  98465.812500
87646 1999-12-31 23:30:00  2.517580  3.612560  272.907867  98490.921875
87647 2000-01-01 00:30:00  2.752302  3.408454  273.226990  98509.031250

[87648 rows x 5 columns]

Lastly, requesting 6 years, which by default will be the 6 years up to the end of the most recent full month:

df = rd.download_reanalysis_data_planetos("merra2",lat,lon,num_years=6)

df
                 datetime      u_50      v_50        t_2m     surf_pres
0     2015-08-01 00:30:00 -4.981873 -1.358198  286.007812  97589.085938
1     2015-08-01 01:30:00 -4.847740 -0.381847  285.551300  97576.781250
2     2015-08-01 02:30:00 -4.409152  0.083434  285.012390  97540.007812
3     2015-08-01 03:30:00 -3.736572 -0.024176  284.313782  97510.742188
4     2015-08-01 04:30:00 -2.728259  0.036082  284.508942  97545.765625
...                   ...       ...       ...         ...           ...
52603 2021-07-31 19:30:00  3.461423 -0.915823  288.748260  97289.398438
52604 2021-07-31 20:30:00  3.608768 -1.049836  288.247986  97263.187500
52605 2021-07-31 21:30:00  3.462683 -0.738081  287.779388  97301.031250
52606 2021-07-31 22:30:00  3.287869 -0.459484  287.502136  97329.554688
52607 2021-07-31 23:30:00  3.366580 -0.410493  287.400696  97313.023438

[52608 rows x 5 columns]
codecov-commenter commented 3 years ago

Codecov Report

Merging #172 (9ead7a0) into develop (c038ca5) will decrease coverage by 1.30%. The diff coverage is 52.34%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #172      +/-   ##
===========================================
- Coverage    70.77%   69.46%   -1.31%     
===========================================
  Files           23       24       +1     
  Lines         1591     1716     +125     
===========================================
+ Hits          1126     1192      +66     
- Misses         465      524      +59     
Impacted Files Coverage Δ
operational_analysis/types/reanalysis.py 36.36% <28.00%> (-6.07%) :arrow_down:
...tional_analysis/toolkits/reanalysis_downloading.py 58.25% <58.25%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c038ca5...9ead7a0. Read the comment docs.

nbodini commented 3 years ago

From a scientific point of view this looks great, @ejsimley! I have no issues to report.

The only discussion point I have is whether it makes more sense to add a function (or include this in the reanalysis_downloading) to calculate wind direction and air density instead of keeping them as a necessary step to have in the project script. But totally a very minor point, I am fine either way!

ejsimley commented 3 years ago

@RHammond2, @nbodini the latest commits should address your comments from back in August. I ended up revising quite a bit, so let me know if this looks good or if there are any more changes you'd like to make.

Rob: In addition to your inline comments, I added leading underscores to some of the internal/helper function in the reanalysis toolkit and made a unit test file for the reanalysis downloading toolkit to test a couple of the helper functions.

Nicola: Based on your suggestion, I added the option to automatically derive wind speed and direction as well as air density from the available reanalysis variables. There's an option for this in the reanalysis_downloading toolkit function download_reanalysis_data_planetos and also a method to compute these variables in the reanalysis data type class, called compute_derived_variables.

I made another change based on some feedback from Patrick Duffy, who did some "beta testing" of the toolkit. Instead of requiring the user to provide a dictionary mapping the PlanetOS variable names to desired variable names, I added the argument var_names to download_reanalysis_data_planetos so users can just specify the PlanetOS variable names if they don't care about renaming them.

nbodini commented 3 years ago

Thanks for your hard work on this, Eric, this looks great.

Side note: I also got the same little fail in the test for check_simulation_results_gbm_daily while I was working on the AEP filter PR.. not sure what this was caused by, but I don't think it's related to your PR specifically.

ejsimley commented 3 years ago

Hey @RHammond2, I addressed your last comments. Let me know if you think this looks good or if any of the additions should be modified.

jordanperr commented 3 years ago

Triggering CI/CD pipeline