EcoExtreML / STEMMUS_SCOPE

Integrated code of SCOPE and STEMMUS
GNU General Public License v3.0
14 stars 2 forks source link

Schaap dataset is uncompressed #150

Open BSchilperoort opened 1 year ago

BSchilperoort commented 1 year ago

I noticed that the Schaap soil data is not compressed. The netCDF file format supports compression, which could save a very large amount of disk space, while having little impact on performance (it is more likely that a moderate compression speeds up performance).

the nccopy tool (included in the netCDF software just like ncdump) allows for easily copying and compressing the data.

For example:

nccopy -d 5 PTF_SoilGrids_Schaap_sl1_alpha.nc PTF_SoilGrids_Schaap_sl1_alpha_COMPRESSED.nc

Copies the file, while compressing it with deflate level 5. (ranges 0 -- 9).

Compressing the Schaap data can save 100 GB of disk space.

BSchilperoort commented 1 year ago

The following script can be used if nccopy is installed on the system:

# Compress all soil_property data:
from pathlib import Path
import subprocess

infiles = [str(f) for f in Path("C:/STEMMUS_SCOPE_data/soil_property").rglob("*.nc")]
outfiles = [f.replace("soil_property", "soil_property_compressed") for f in infiles]

for infile, outfile in zip(infiles, outfiles):
    subprocess.run(f"nccopy -d 4 {infile} {outfile}")  # perhaps split this string on unix
SarahAlidoost commented 1 year ago

Nice finding, thanks. A the beginning of the project, the data is copied from CRIB to Snellius. I am wondering about the data format in its original source. @Yunfei-Wang1993 explains data sources in his paper. Your solution can be suggested to the data provider.

BSchilperoort commented 1 year ago

@Yunfei-Wang1993, @yijianzeng in the PLUMBER2 paper it is stated that the "SoilGrids" dataset is used. However, I am unable to find exactly where the netCDF files come from.

Also, there is the Shangguan 2014 dataset ("GSDE"), which links to https://globalchange.bnu.edu.cn/, however that website seems to be down.

Could you tell me where the files in the soil_property folder came from (including subfolders)?

Folder screenshot. ![image](https://user-images.githubusercontent.com/12114825/229469545-7c7bbbf7-9c19-4678-801a-785325cb7d1b.png)
yijianzeng commented 1 year ago

@Yunfei-Wang1993, @yijianzeng in the PLUMBER2 paper it is stated that the "SoilGrids" dataset is used. However, I am unable to find exactly where the netCDF files come from.

Also, there is the Shangguan 2014 dataset ("GSDE"), which links to https://globalchange.bnu.edu.cn/, however that website seems to be down.

Could you tell me where the files in the soil_property folder came from (including subfolders)?

Folder screenshot.

Hi Bart, this is coming from the following paper:

Montzka, C., Herbst, M., Weihermüller, L., Verhoef, A., and Vereecken, H.: A global data set of soil hydraulic properties and sub-grid variability of soil water retention and hydraulic conductivity curves, Earth Syst. Sci. Data, 9, 529–543, https://doi.org/10.5194/essd-9-529-2017, 2017.

Although it was stated 0.25 deg resolution, the original product was generated at 1km resolution, which can be obtained by contacting the author of this ESSD paper (and can be found here: https://fz-juelich.sciebo.de/s/xILqOr9hxlEzM7c ).

i hope the above is ok.

Cheers, Yijian

BSchilperoort commented 1 year ago

Thanks for you reply, @yijianzeng , however, this is only part of the data. There are also the files such as SAND1.nc or CLAY1.nc, as well as files like PTF_SoilGrids_Schaap_sl1_alpha.nc (etc).

Yunfei-Wang1993 commented 1 year ago

Thanks for you reply, @yijianzeng , however, this is only part of the data. There are also the files such as SAND1.nc or CLAY1.nc, as well as files like PTF_SoilGrids_Schaap_sl1_alpha.nc (etc).

Hi, Bart, the soil hydraulic parameters (the Schaap files) come from Montzka's datasets. And the other soil properties (including CLAY, OC, POR, SAND, SILT, lambda et al.) come from Shangguan's dataset. I have check the link and it can't be open now. I will try to find the new link where can download these data.

Shangguan, W., Dai, Y., Duan, Q., Liu, B., and Yuan, H.: A global soil data set for earth system modeling, Journal of Advances in Modeling Earth Systems, 6, 249-263, 10.1002/2013ms000293, 2014.

Yunfei-Wang1993 commented 1 year ago

@Yunfei-Wang1993, @yijianzeng in the PLUMBER2 paper it is stated that the "SoilGrids" dataset is used. However, I am unable to find exactly where the netCDF files come from.

Also, there is the Shangguan 2014 dataset ("GSDE"), which links to https://globalchange.bnu.edu.cn/, however that website seems to be down.

Could you tell me where the files in the soil_property folder came from (including subfolders)?

Folder screenshot.

@BSchilperoort Hi, Bart, I check the link again and it can be opened now. Please use the new link: http://globalchange.bnu.edu.cn/research/soilw