TUW-GEO / ismn

Readers for the data from the International Soil Moisture Network
https://ismn.earth/en/
MIT License
30 stars 21 forks source link

No 'variable' in station '3.09' and min_depth/max_depth don't work #51

Open xushanthu-2014 opened 1 year ago

xushanthu-2014 commented 1 year ago

I am trying to extract data from station 3.09, about the variable 'soil_moisture' from depth 0.01 to 0.04. By default I should write command 1 like this:

min_depth,max_depth=0.01, 0.04
ids = ismn_data.get_dataset_ids(variable='soil_moisture',
                                        min_depth=min_depth,
                                        max_depth=max_depth,
                                        filter_meta_dict={'station': '3.09',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

But this yields no element in ids. So I tried to print the metadata for 3.09 by ismn_data.read(1098, return_meta=True), and found there is indeed soil moisture in the metadata, but no values in variable:

ismn_data.read(1098, return_meta=True)
Out[117]: 
(                     soil_moisture soil_moisture_flag soil_moisture_orig_flag
 date_time                                                                    
 2017-01-01 00:00:00          0.192                  G                       M
                           ...                ...                     ...
 2019-02-22 09:00:00          0.155                  G                       M

 [12745 rows x 3 columns],
 variable        key       
 clay_fraction   val                           5.2
                 depth_from                    0.0
                 depth_to                     0.05
 climate_KG      val                           Dfb
 climate_insitu  val                       unknown
 elevation       val                         104.0
 instrument      val                 Decagon-5TE-B
                 depth_from                    0.0
                 depth_to                     0.05
 latitude        val                       55.8609
 lc_2000         val                            10
 lc_2005         val                            10
 lc_2010         val                            10
 lc_insitu       val                          None
 longitude       val                        9.2945
 network         val                          HOBE
 organic_carbon  val                           0.5
                 depth_from                    0.0
                 depth_to                      0.3
 sand_fraction   val                          85.1
                 depth_from                    0.0
                 depth_to                     0.05
 saturation      val                          0.41
                 depth_from                    0.0
                 depth_to                      0.3
 silt_fraction   val                           5.7
                 depth_from                    0.0
                 depth_to                     0.05
 station         val                          3.09
 timerange_from  val           2017-01-01 00:00:00
 timerange_to    val           2019-02-22 09:00:00
 variable        val                 soil_moisture
                 depth_from                    0.0
                 depth_to                     0.05
 Name: data, dtype: object)

You can see, right above clay_fraction, there is no value of key variable. So I have to use command 2

ids = ismn_data.get_dataset_ids(variable=None,min_depth=min_depth,
                                        max_depth=max_depth,
                                        filter_meta_dict={'station': '3.09',
                                                          'variable'='soil_moisture',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

but still get nothing in ids. I found that's because I set min_depth and max_depth. If I delete min_depth and max_depth in command 2, I can get ids as [1098, 1104]. But I do want to extract values between 0.01 and 0.04. So is there anything wrong in the data on 3.09? And I am confused what's the difference bewteen command 1 and command 2?

xushanthu-2014 commented 1 year ago

is there anybody who can help me?

wpreimes commented 1 year ago

Hi, sorry for the late reply. The problem is that at station 3.09, soil moisture sensors are operating between 0 and 5 cm, while your query looks for sensors between 1 and 4 cm (which do not exist).

wpreimes commented 1 year ago

you can see it by selecting the station and by listing all sensor names (the numbers in the name refer to the depths in meters). Please note that your dataset might look differently, I am using an older snapshot of ISMN here.

>> ismn_data['HOBE']['3.09']

Out[22]: Sensors at '3.09': ['Decagon-5TE-A_soil_moisture_0.000000_0.050000', 'Decagon-5TE-B_soil_moisture_0.000000_0.050000', 'Decagon-5TE-A_soil_moisture_0.200000_0.250000', 'Decagon-5TE-B_soil_moisture_0.200000_0.250000', 'Decagon-5TE_soil_moisture_0.500000_0.550000', 'Decagon-5TE-A_soil_temperature_0.000000_0.050000', 'Decagon-5TE-B_soil_temperature_0.000000_0.050000', 'Decagon-5TE-A_soil_temperature_0.200000_0.250000', 'Decagon-5TE-B_soil_temperature_0.200000_0.250000', 'Decagon-5TE_soil_temperature_0.500000_0.550000']
wpreimes commented 1 year ago

My suggestion is, to be less restrictive and allow sensors from e.g 0 to 5 cm instead of 1 to 4 cm

xushanthu-2014 commented 1 year ago

Thanks for your reply @wpreimes! But I want to loop over all European stations. So I am not able to print all sensors out, then select the sensor one by one...besides, I am comparing to my model simulations of soil moisture at each layer ([0, 0.01, 0.04, 0.1, 0.2, 0.4, 0.6, 0.8, 1] meters). So I am finding a way to match the depth of ISMN stations to my model layers, at the same lat/lon grid. For example, my model simulations of grid (containing 3.09) from 0.01 to 0.04 m are matched to observations from 0 to 5 cm on station 3.09. And model simulations of grid containing station X from Y_1 to Y_2 depth are matched observations from Z_1 to Z_2 depth on station X, where [Z_1, Z_2] contains [Y_1, Y_2], or [Y_1, Y_2] contains [Z_1, Z_2], as long as the observation 'match' the model layers. The other problem is that, I saw that all depth configurations are different across all European ISMN stations. For example, at other stations, they might have depths like 0 to 8 cm....so I cannot just write a loop to run codes of 3.09 to other stations...so is there any way to solve my problems? Thanks!

wpreimes commented 1 year ago

Printing the names was only meant as an example to explain the problem for that specific station. Matching the different layers between model and insitu data is not straight forward as you noticed. Some tradeoffs will be necessary, especially for sensors that cover a wide range of depths.

Here are some suggestions:

wpreimes commented 1 year ago

Also, I'm not sure if there are even any sensors that measure SM e.g. between 1 and 4 cm depth in ISMN at all. Just to strengthen my point about making some compromises in your approach. @daberer might know that.

xushanthu-2014 commented 1 year ago

Thanks! @wpreimes, let me try your suggestions first

daberer commented 1 year ago

Hi, I think for the majority of soil moisture sensors at ISMN the sensor orientation is horizontal (depth_from = depth_to). I checked there are 271 soil moisture sensors within 1 - 4cm bracket if the margin-values (1 and 4cm) are included, mostly from the networks HiWATER_EHWSN and SMN-SDR. Often networks have a similar composition for all locations (same sensors in the same depths), but overall the depths are quite diverse as you noticed.

xushanthu-2014 commented 1 year ago

Hi, sorry for the late reply. The problem is that at station 3.09, soil moisture sensors are operating between 0 and 5 cm, while your query looks for sensors between 1 and 4 cm (which do not exist).

Hi @wpreimes thanks for your comment, but there exists another problem. If I tried with:

ids = ismn_data.get_dataset_ids(variable='soil_moisture',
                                        filter_meta_dict={'station': '3.09',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

I can get nothing. That's because in the metadata of 3.09, the value of key variable is None. So I have to use:

ids = ismn_data.get_dataset_ids(variable=None,
                                        filter_meta_dict={'station': '3.09',
                                                          'variable'='soil_moisture',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

I have to write 'variable'='soil_moisture', in the filter_meta_dict. Is that normal? because I can use the first command for other stations, except for 3.09. So does it mean there is a bug in metadata of 3.09? And is it ok to use the second one for other stations? For other details please refer to the description of the issue at the top of this page. Thanks!

wpreimes commented 1 year ago

Hi, I just downloaded the ISMN data for HOBE and tried the 2 function calls you posted and I got the same IDs for both of them. About the metadata, I don't understand what you mean with "metadata of 3.09, the value of key variable is None.". You posted the metadata table in your initial comment, and there you see the "variable" is "soil moisture" for the selected sensor (the last 4 lines, the first line is only the labels for the data frame)

 variable        key       
 clay_fraction   val                           5.2
                 depth_from                    0.0
                 depth_to                     0.05
 climate_KG      val                           Dfb
 climate_insitu  val                       unknown
 elevation       val                         104.0
 instrument      val                 Decagon-5TE-B
                 depth_from                    0.0
                 depth_to                     0.05
 latitude        val                       55.8609
 lc_2000         val                            10
 lc_2005         val                            10
 lc_2010         val                            10
 lc_insitu       val                          None
 longitude       val                        9.2945
 network         val                          HOBE
 organic_carbon  val                           0.5
                 depth_from                    0.0
                 depth_to                      0.3
 sand_fraction   val                          85.1
                 depth_from                    0.0
                 depth_to                     0.05
 saturation      val                          0.41
                 depth_from                    0.0
                 depth_to                      0.3
 silt_fraction   val                           5.7
                 depth_from                    0.0
                 depth_to                     0.05
 station         val                          3.09
 timerange_from  val           2017-01-01 00:00:00
 timerange_to    val           2019-02-22 09:00:00
 variable        val                 soil_moisture
                 depth_from                    0.0
                 depth_to                     0.05
 Name: data, dtype: object)

and you can access it e.g. via

>> ismn_data.read_metadata(1098)['variable']

key
val           soil_moisture
depth_from              0.0
depth_to               0.05
Name: data, dtype: object

maybe you want to re-generate the python metadata if you feel that something is wrong there (removing or renaming the folder python_metadata in the ISMN data path should lead to re-collecting the metadata the next time you initialize the reader). Make sure you have the latest version of this package installed. In case the data is erroneous you can try and download the HOBE data separately again and replace the files in your collection with the new ones (make sure to re-collect the metadata when you change your local data collection).

xushanthu-2014 commented 1 year ago

Hi @wpreimes, thanks for your reply. By ismn.__version__ I got the version is '1.1.0'. Is it the latests one? And I found that I can use the variable='soil_moisture' outside the filter_meta_dict. By the way, if I try ismn_data['HOBE']['3.09'], there is Decagon-5TE-B_soil_moisture_0.200000_0.250000' which means soil moisture from 0.2 to 0.25m. But just using ismn_data.read_metadata(1098)['variable'] doesn't show this....

wpreimes commented 1 year ago

v1.2.0 would be the latest. You can try pip install -U ismn to upgrade. The commands ISMN_Interface.read_metadata() (and ISMN_Interface.read_ts) read data for certain ID. the ID refers to a specific sensor (as indicated by contents of the metadata). At a station such as HOBE 3.09 there can be multiple sensors. In your case, 1098 is the ID of the soil moisture sensor at this station in 0-5 cm depth, and your command is reading the metadata for that sensor. The sensor in depth 0.2-0.25 is different, and therefore has a different ID.

xushanthu-2014 commented 1 year ago

Thanks for your reply! @wpreimes but when I tried v1.2.0, it seems to be an error when I was reading data Data_separate_files_header_20170101_20211231_9078_Zd6I_20220911 (this is a file I downloaded from ISMN station, containing lat from 36N to 58N and lon from 11.75W to 29.5E.):

ismn_data = ISMN_Interface(data_path)
Files Processed: 100%|██████████| 321/321 [00:00<00:00, 4521.32it/s]Processing metadata for all ismn stations into folder /Users/xushan/research/TUD/ISMN_westEurope/Data_separate_files_header_20170101_20211231_9078_Zd6I_20220911.
This may take a few minutes, but is only done once...
Hint: Use `parallel=True` to speed up metadata generation for large datasets
Metadata generation finished after 0 Seconds.
Metadata and Log stored in /Users/xushan/research/TUD/ISMN_westEurope/Data_separate_files_header_20170101_20211231_9078_Zd6I_20220911/python_metadata

Traceback (most recent call last):

  File "<ipython-input-23-84af3e3a7ed0>", line 1, in <module>
    ismn_data = ISMN_Interface(data_path)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/ismn/interface.py", line 135, in __init__
    self.activate_network(network=network, meta_path=meta_path, temp_root=temp_root)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/ismn/interface.py", line 166, in activate_network
    self.__file_collection.to_metadata_csv(meta_csv_file)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/ismn/filecollection.py", line 403, in to_metadata_csv
    dfs = pd.concat(dfs, axis=0, sort=True)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 304, in concat
    sort=sort,

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 351, in __init__
    raise ValueError("No objects to concatenate")

ValueError: No objects to concatenate

Can you please help me with this? Thanks!