JJguri / bestiapop

Python package to automatically generate gridded climate data to be used as input for crop models
Other
13 stars 6 forks source link

include NASA POWER as an optional weather data service #19

Open JJguri opened 3 years ago

JJguri commented 3 years ago

Currently, bestiapop is specialized to operate on the weather data portal service available from SILO which is restricted to Australia. The objective is to implement an optional variable that accesses different weather data service systems at the global scale and prepares weather input files using bestiapop. One example is NASAPOWER, check the script for regional daily data.

One consideration is the name and number of variables in each source (eg SILO vs NASAPOWER). They have a different number of variables and names, so bestiapop needs to be able to sense that and change the user's options in terms of variables.

Here you have an example of the variables we need to create a MET file from the NASAPOWER data source:

POWER_SinglePoint_Daily_20150101_20150305_43d00S_144d00E_d6fbb90e.xlsx

JJguri commented 3 years ago

We need to consider that NASAPOWER provides data at 0.5°C resolution and SILO at 0.05°C. The spatial resolution of the product needs to be stated somewhere in the head of both MET and WHT files. This also needs to be included in the documentation, so users can be aware of the differences they can have due to spatial resolution of the climate data.

darkquasar commented 3 years ago

So here's the thing, NASAPOWER says that they may ban your application from querying their database if you request data for the same 0.5°x0.5° grid too frequently. I made a decision to step every 1° for the grid, so you still get those 0.5°x0.5° boxes, but their data is actually different.

This is NASAPOWER warning message:

If you are going to be using the Daily API to download the entire catalog please remember that with 0.5° x 0.5° global grid you only need to submit one request per cell. If your application persists is requesting the same relative location it potentially will be blocked.

It would be good if you could validate whether data is different for every 0.5° jump or whether it is generally the same, in which case we could just opt for leaving 1° jumps for data extracted from NASAPOWER.

Example, try these three URLs in your browser and check the data, it's identical (as far as I could see with the naked eye):

  1. https://power.larc.nasa.gov/cgi-bin/v1/DataAccess.py?&request=execute&tempAverage=DAILY&identifier=SinglePoint&parameters=ALLSKY_SFC_SW_DWN,T2M_MAX,T2M_MIN,PRECTOT&userCommunity=AG&lat=-48&lon=146&startDate=20160101&endDate=20161231&outputList=JSON&user=DOCUMENTATION
  2. https://power.larc.nasa.gov/cgi-bin/v1/DataAccess.py?&request=execute&tempAverage=DAILY&identifier=SinglePoint&parameters=ALLSKY_SFC_SW_DWN,T2M_MAX,T2M_MIN,PRECTOT&userCommunity=AG&lat=-48.5&lon=146&startDate=20160101&endDate=20161231&outputList=JSON&user=DOCUMENTATION
  3. https://power.larc.nasa.gov/cgi-bin/v1/DataAccess.py?&request=execute&tempAverage=DAILY&identifier=SinglePoint&parameters=ALLSKY_SFC_SW_DWN,T2M_MAX,T2M_MIN,PRECTOT&userCommunity=AG&lat=-48&lon=146.5&startDate=20160101&endDate=20161231&outputList=JSON&user=DOCUMENTATION

What I'm saying is, we can make BestiaPop request 0.5x0.5 boxes for every 0.5° instead of 1°, but we run the risk of requesting the same data over and over, because of the way NASAPOWER averages values.

JJguri commented 3 years ago

I checked the files and they are different. Surprisingly, 1 and 3 are the same and 2 is different than 1 and 3. This means we should keep the 0.5*0.5 resolution if it is not a problem. Looking at the data I found -99 values. Please look at the date 20160726: for files 1 and 3. It is a big problem with a simple solution. Crop models will not run with these values, which actually are NaN values. Due to that the SILO team already fixed this issue in their API. For these cases where -99 values appear, I recommend adding a line to the code which calculates the mean values between the previous and the following value of the variable. We need to fix this, otherwise, the files generated from NASAPOWER are not functionals. These calculations need to be explained in the documentation in the NASAPOWER section.

JJguri commented 3 years ago

Bestiapop is working well when you want to generate a series of years for a given lat-long combination, i.e. one grid, using nasapower as the source data. However, I found that when you want to generate more than a grid, the code just generates 1 file. I applied the following script and I got only one file when it should be 36 files:

python bestiapop.py -a generate-climate-file -s nasapower -y "2019" -c "radiation max_temp min_temp daily_rain" -lat "-41.25 -41" -lon "145 145.25" -o D:\TEST -ot met -m
darkquasar commented 3 years ago

The reason why this command python bestiapop.py -a generate-climate-file -s nasapower -y "2019" -c "radiation max_temp min_temp daily_rain" -lat "-41.25 -41" -lon "145 145.25" -o D:\TEST -ot met -m generates only one file is because when BestiaPop generates the array of latitudes and longitudes, it is doing it in steps of 1 and not 0.05.

When the granularity is 0.05 you would get 36 combinations, but even if we set the granularity to 0.5 (which is the minimum for the 0.5*0.5 resolution offered by NASAPOWER) you would still get just a single combination since "-41.25" and "-41" are less than 0.5 decimals away from each other. The same logic applies to the longitude.

I've fixed the granularity now to generate arrays of values every 0.5 for NASAPOWER.

NOTE: even under this configuration NASAPOWER will always generate less files compared to SILO since SILO will create data points every 0.05.

Example

for SILO lat -41.05 to -42, BestiaPop will generate the following array with 20 values [-42. , -41.95, -41.9 , -41.85, -41.8 , -41.75, -41.7 , -41.65, -41.6 , -41.55, -41.5 , -41.45, -41.4 , -41.35, -41.3 , -41.25, -41.2 , -41.15, -41.1 , -41.05]

for the same lat range but for NASAPOWER, it will generate the following array [-42.0, -41.5, -41.0]

The quantity of datapoints for NASAPOWER are (n/10)+1 compared to those from SILO

darkquasar commented 3 years ago

I checked the files and they are different. Surprisingly, 1 and 3 are the same and 2 is different than 1 and 3. This means we should keep the 0.5*0.5 resolution if it is not a problem. Looking at the data I found -99 values. Please look at the date 20160726: for files 1 and 3. It is a big problem with a simple solution. Crop models will not run with these values, which actually are NaN values. Due to that the SILO team already fixed this issue in their API. For these cases where -99 values appear, I recommend adding a line to the code which calculates the mean values between the previous and the following value of the variable. We need to fix this, otherwise, the files generated from NASAPOWER are not functionals. These calculations need to be explained in the documentation in the NASAPOWER section.

So I've fixed the resolution, but the calculation of the mean between the previous and the following value of variables with value -99 in it seems to be more complicated than I thought. I believe you would be able to figure out a way of doing it with pandas faster than me so perhaps you can import one csv file created from NASAPOWER into a dataframe and test it. Once you find the lines of code that do the job please share them and I will integrate them into the code.

JJguri commented 3 years ago

The reason why this command python bestiapop.py -a generate-climate-file -s nasapower -y "2019" -c "radiation max_temp min_temp daily_rain" -lat "-41.25 -41" -lon "145 145.25" -o D:\TEST -ot met -m generates only one file is because when BestiaPop generates the array of latitudes and longitudes, it is doing it in steps of 1 and not 0.05.

When the granularity is 0.05 you would get 36 combinations, but even if we set the granularity to 0.5 (which is the minimum for the 0.5*0.5 resolution offered by NASAPOWER) you would still get just a single combination since "-41.25" and "-41" are less than 0.5 decimals away from each other. The same logic applies to the longitude.

I've fixed the granularity now to generate arrays of values every 0.5 for NASAPOWER.

NOTE: even under this configuration NASAPOWER will always generate less files compared to SILO since SILO will create data points every 0.05.

Example

for SILO lat -41.05 to -42, BestiaPop will generate the following array with 20 values [-42. , -41.95, -41.9 , -41.85, -41.8 , -41.75, -41.7 , -41.65, -41.6 , -41.55, -41.5 , -41.45, -41.4 , -41.35, -41.3 , -41.25, -41.2 , -41.15, -41.1 , -41.05]

for the same lat range but for NASAPOWER, it will generate the following array [-42.0, -41.5, -41.0]

The quantity of datapoints for NASAPOWER are (n/10)+1 compared to those from SILO

@darkquasar thanks for the detailed explanation. It would be good if this explanation can be added also to the documentation in the NASAPOWER section as a particularity about data resolution.

JJguri commented 3 years ago

I checked the files and they are different. Surprisingly, 1 and 3 are the same and 2 is different than 1 and 3. This means we should keep the 0.5*0.5 resolution if it is not a problem. Looking at the data I found -99 values. Please look at the date 20160726: for files 1 and 3. It is a big problem with a simple solution. Crop models will not run with these values, which actually are NaN values. Due to that the SILO team already fixed this issue in their API. For these cases where -99 values appear, I recommend adding a line to the code which calculates the mean values between the previous and the following value of the variable. We need to fix this, otherwise, the files generated from NASAPOWER are not functionals. These calculations need to be explained in the documentation in the NASAPOWER section.

So I've fixed the resolution, but the calculation of the mean between the previous and the following value of variables with value -99 in it seems to be more complicated than I thought. I believe you would be able to figure out a way of doing it with pandas faster than me so perhaps you can import one csv file created from NASAPOWER into a dataframe and test it. Once you find the lines of code that do the job please share them and I will integrate them into the code.

@darkquasar I developed a simple code to fill -99 values with averages considering the previous and the following row. If there are two or more consecutively -99 values, the code will consider the near value different than -99. code and dataframe. Note I used a single dataframe, to be embedded in bestiapop it needs to consider the heads and the particular name of the variables. It would be good also add in the documentation a single explanation about this particularity of the NASAPOWER data.

darkquasar commented 3 years ago

Thanks for the code, it works like a charm :) I knew you could pull it off in a very simple way using pandas, I would have ended up needing 5 to 10 lines of code to achieve the same result.

This has now been fixed in commit d31567e.

Test Example

This command python bestiapop.py -a generate-climate-file -s nasapower -y "2019" -c "radiation max_temp min_temp daily_rain" -lat "-41 -42" -lon "145.5 145" -o ..\test -ot met used to generate data for file -41.5-145.0.met and Julian Day 289 with an invalid value of -99.

...
2019 287 8.2 14.2 7.6 0.6
2019 288 14.7 13.3 6.1 0.7
2019 289 -99.0 12.6 4.1 1.3
2019 290 20.5 13.1 5.7 1.9
2019 291 6.6 11.0 6.2 6.0
...

It is now replacing that value with the appropriate mean between the previous and consecutive values as expected:

2019 287 8.2 14.2 7.6 0.6
2019 288 14.7 13.3 6.1 0.7
2019 289 17.6 12.6 4.1 1.3
2019 290 20.5 13.1 5.7 1.9
2019 291 6.6 11.0 6.2 6.0

If this is all that's required from a NASAPOWER integration point of view, please close the issue, otherwise let me know what else requires fixing.

darkquasar commented 3 years ago

Can you have a go at updating the documentation too? A few lines explaining that we do this mean calculation for -99 values should suffice.

JJguri commented 3 years ago

Can you have a go at updating the documentation too? A few lines explaining that we do this mean calculation for -99 values should suffice.

done in the README, soon in the index file