c-scale-community / use-case-aquamonitor

Apache License 2.0
2 stars 1 forks source link

Sprint 1: 11-15 October 2021 #19

Closed backeb closed 2 years ago

backeb commented 2 years ago

Sprint 1: 11-15 October 2021

Switch to INCD OpenEO backend and get Sentinel-2 L1C data

Current status: Have been developing a Notebook for the full workflow using the VITO backend for OpenEO.

Objective: Switch to INCD OpenEO backend (requires that data is available on INCD)

Actions:

Additional notes: https://confluence.egi.eu/x/pRCxBg

cc @avgils @gdonvito @sebastian-luna-valero

zbenta commented 2 years ago

Since we talked about EODAG in the meeting, I deduce that we can create a instance inside our infrastructure and use it as a proxy for the data access. I already created an account at SciHub. Would this be a good first step?

sustr4 commented 2 years ago

Sorry I could not be at today's meeting. I at least was able to agree with @backeb yesterday that we need to gen synced.

Would this be a good first step?

What would you like to achieve? SciHub is relatively slow and holds one the last year's worth of archives anyway. We can provide a fast source of fresh Sentinel data, and probably think of a ore efficient way to fill up your archive if you want historic data.

zbenta commented 2 years ago

What would you like to achieve? SciHub is relatively slow and holds one the last year's worth of archives anyway. We can provide a fast source of fresh Sentinel data, and probably think of a ore efficient way to fill up your archive if you want historic data.

What we've discussed in the meting, was that EODAG could be used to test the possibility to act as a proxy to access the datasets we need. It will be used as a "proof of concept" and then we build from there.

zbenta commented 2 years ago

Hi everyone,

While trying to install eodag we get the following error messages:

WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting eodag
  Using cached https://files.pythonhosted.org/packages/cc/33/2569d07329aaabe3fb41ff13e16584a429eaf6e18b576a26db8725a6697a/eodag-2.3.3-py3-none-any.whl
Collecting owslib (from eodag)
  Using cached https://files.pythonhosted.org/packages/00/be/a0288a3d7bcea2038c9111497d905bc998eaa84fcba4f2a7f904c962ecf0/OWSLib-0.25.0-py2.py3-none-any.whl
Collecting boto3 (from eodag)
  Using cached https://files.pythonhosted.org/packages/b6/91/3ea4bc175ca0135a7d4f5f32c5f3fb3a7a87be6a743792b4aeb47e9794e4/boto3-1.18.51-py3-none-any.whl
Collecting lxml (from eodag)
  Using cached https://files.pythonhosted.org/packages/1f/1d/a4485412268b38043a6c0f873245b5d9315c6615bcf44776759a2605dca5/lxml-4.6.3-cp36-cp36m-manylinux1_x86_64.whl
Collecting tqdm (from eodag)
  Using cached https://files.pythonhosted.org/packages/63/f3/b7a1b8e40fd1bd049a34566eb353527bb9b8e9b98f8b6cf803bb64d8ce95/tqdm-4.62.3-py2.py3-none-any.whl
Collecting requests (from eodag)
  Using cached https://files.pythonhosted.org/packages/92/96/144f70b972a9c0eabbd4391ef93ccd49d0f2747f4f6a2a2738e99e5adc65/requests-2.26.0-py2.py3-none-any.whl
Collecting flasgger (from eodag)
  Using cached https://files.pythonhosted.org/packages/00/25/9f353c72fd2bf37908bf30509e7dfbb051c96bc08619226807de71ec9150/flasgger-0.9.5-py2.py3-none-any.whl
Requirement already satisfied: PyYAML in /usr/local/lib64/python3.6/site-packages (from eodag)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.6/site-packages (from eodag)
Collecting pystac>=1.0.0b1 (from eodag)
  Using cached https://files.pythonhosted.org/packages/bc/fa/4b35e76847250aff850e61adbc7fabde72a4ca584589d8762277984397ab/pystac-1.0.0rc2-py3-none-any.whl
Collecting geojson (from eodag)
  Using cached https://files.pythonhosted.org/packages/e4/8d/9e28e9af95739e6d2d2f8d4bef0b3432da40b7c3588fbad4298c1be09e48/geojson-2.5.0-py2.py3-none-any.whl
Collecting pyproj (from eodag)
  Using cached https://files.pythonhosted.org/packages/2c/12/7a8cca32506747c05ffd5c6ba556cf8435754af0939906cbcc7fa5802ea3/pyproj-3.0.1.tar.gz
    Complete output from command python setup.py egg_info:
    proj executable not found. Please set the PROJ_DIR variable. For more information see: https://pyproj4.github.io/pyproj/stable/installation.html

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-4aqdxsxm/pyproj/

We then tried to install proj as follows:

WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting pyproj
  Using cached https://files.pythonhosted.org/packages/2c/12/7a8cca32506747c05ffd5c6ba556cf8435754af0939906cbcc7fa5802ea3/pyproj-3.0.1.tar.gz
    Complete output from command python setup.py egg_info:
    proj executable not found. Please set the PROJ_DIR variable. For more information see: https://pyproj4.github.io/pyproj/stable/installation.html

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-vbrg52re/pyproj/

We went on the the documentation page for pyproj and defined the PROJ_DIR environment variable as follows:

export PROJ_DIR=/usr/local

The process keeps hanging on now due to the fact that the executable does not exist "What The Frak"!!!

WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting pyproj
  Using cached https://files.pythonhosted.org/packages/2c/12/7a8cca32506747c05ffd5c6ba556cf8435754af0939906cbcc7fa5802ea3/pyproj-3.0.1.tar.gz
    Complete output from command python setup.py egg_info:
    PROJ_DIR is set, using existing PROJ installation..

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-1wg3tp49/pyproj/setup.py", line 224, in <module>
        ext_modules=get_extension_modules(),
      File "/tmp/pip-build-1wg3tp49/pyproj/setup.py", line 155, in get_extension_modules
        proj_version = get_proj_version(proj_dir)
      File "/tmp/pip-build-1wg3tp49/pyproj/setup.py", line 22, in get_proj_version
        proj_ver = subprocess.check_output(str(proj), stderr=subprocess.STDOUT).decode(
      File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
        **kwargs).stdout
      File "/usr/lib64/python3.6/subprocess.py", line 423, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
        restore_signals, start_new_session)
      File "/usr/lib64/python3.6/subprocess.py", line 1364, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/bin/proj': '/usr/local/bin/proj'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-1wg3tp49/pyproj/

Any ideas?

jdries commented 2 years ago

Python libraries often have dependencies on C libraries, in this case the proj library. Most distributions allow installing that through your package manager, for instance on Ubuntu: https://packages.ubuntu.com/bionic/libproj-dev

Note that often the 'dev' variant of the library is required, as sometimes Python packages compile wrappers at installation time.

sebastian-luna-valero commented 2 years ago

Hi,

Provided eodag is available in https://anaconda.org/conda-forge/eodag

Here is what I tried and worked for me:

mktemp -d
cd /tmp/tmp.wQJG03ncEu/
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p conda-install
source conda-install/etc/profile.d/conda.sh 
conda create -n eodag -c conda-forge eodag
conda activate eodag
python -c "import eodag"

I hope that helps.

zbenta commented 2 years ago

l

Hi,

Provided eodag is available in https://anaconda.org/conda-forge/eodag

Here is what I tried and worked for me:

mktemp -d
cd /tmp/tmp.wQJG03ncEu/
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p conda-install
source conda-install/etc/profile.d/conda.sh 
conda create -n eodag -c conda-forge eodag
conda activate eodag
python -c "import eodag"

I hope that helps.

Perfect, thanks @sebastian-luna-valero the conda env is working like a charm. Well do some experiments and give you guys some feedback.

zbenta commented 2 years ago

Some preliminary results. We've create the following python script with the help of @jdries and @Jaapel :+1:

from eodag import EODataAccessGateway

dag = EODataAccessGateway()

from shapely.geometry import shape

geom =  {
  "type": "Polygon",
  "coordinates": [
    [
      [
        -7.682155041704681,
        38.620982842287496
      ],
      [
        -7.682155041704681,
        36.18203953636458
      ],
      [
        -5.083888440142181,
        36.18203953636458
      ],
      [
        -5.083888440142181,
        38.620982842287496
      ],
      [
        -7.682155041704681,
        38.620982842287496
      ]
    ]
  ]
}

aquamonitor_aoi = shape(geom)
#aquamonitor_aoi

dag = EODataAccessGateway()
dag.set_preferred_provider("scihub")
search_results, total_count = dag.search(
    productType="S2_MSI_L1C",
    start="2018-01-01",
    #end="2021-01-01",
    end="2018-01-05",
    geom=aquamonitor_aoi,
    cloudCover=80
)

print("Nr of products",total_count)

if total_count > 0 :
        product_paths = dag.download_all(search_results)
        print("Downloaded Data:",product_paths)

We've tried to download some data out of scihub and got the message that we had exceeded our quota.

Nr of products 16
Fetching archival status: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16.0/16.0 [00:03<00:00, 4.62product/s]
LTA retrieval:   0%|                                                                                                                                                                                         | 0.00/16.0 [00:00<?, ?product/s]User quota exceeded: ExpectedException : User 'incd' offline products retrieval quota exceeded (20 fetches max)  trying to fetch product S2B_MSIL1C_20180102T111439_N0206_R137_T30STG_20180102T131121 (843854083 bytes compressed)
User quota exceeded: ExpectedException : User 'incd' offline products retrieval quota exceeded (20 fetches max)  trying to fetch product S2A_MSIL1C_20180104T110431_N0206_R094_T29SQA_20180104T130839 (62463436 bytes compressed)

Does anyone have a unlimited account :smile: ???

jdries commented 2 years ago

Nope, this is exactly the problem with scihub that we tried to explain. The workaround that some people use is to set up multiple accounts, but that also makes the script more complex. That's why setting creodias as provider seems more reasonable. (DIAS's are considered a key element in the distribution of products, so that's also the generally recommended solution.)

zbenta commented 2 years ago

Nope, this is exactly the problem with scihub that we tried to explain. The workaround that some people use is to set up multiple accounts, but that also makes the script more complex. That's why setting creodias as provider seems more reasonable. (DIAS's are considered a key element in the distribution of products, so that's also the generally recommended solution.)

We'll try to create a free trial account on creodias so we can test it out. What do you guys think? But if anyone would be so kind and purchase us a few credits well be much obliged :money_mouth_face:

backeb commented 2 years ago

@sustr4 can you weigh in here? Do you have Sentinel-2 L1C data for Spain? @jkonarski could you perhaps assist with access to CREODIAS in the context of this project?

sustr4 commented 2 years ago

GRNET does have the data. We just need to set up access. @kkoumantaros already offered that, so I'd say he can bring the consumer and provider together. As I explained a few times, this personal intervention will be no longer necessary when we are done with our work in WP2, but for now the right people have to find each other. Please, @kkoumantaros, which endpoint to use at GRNET, and -- if it's not self-registered -- who can set up accounts?

kkoumantaros commented 2 years ago

Hi All you ,seem to be using scihub already. We can provide direct access to our local Mirror but that covers only Greece and our neighbours. If we need something extra I guess we need to send a formal request to ESA for an account in one of the other hubs hosted by GRNET but under control of ESA. (I've asked my contacts about this and I'm waiting for forma answer)

mariojmdavid commented 2 years ago

@zbenta when you can provide download performance numbers

zbenta commented 2 years ago

@zbenta when you can provide download performance numbers

Still downloading, but the current stats we have are the following: Screenshot from 2021-09-30 15-03-21

jkonarski commented 2 years ago

@sustr4 can you weigh in here? Do you have Sentinel-2 L1C data for Spain? @jkonarski could you perhaps assist with access to CREODIAS in the context of this project?

@backeb we have the data in CREODIAS, but I am not sure what assistance and access do you need. Do you want an account in CREODIAS or information about how to download the data? Please give me a clue

backeb commented 2 years ago

@sustr4 can you weigh in here? Do you have Sentinel-2 L1C data for Spain? @jkonarski could you perhaps assist with access to CREODIAS in the context of this project?

@backeb we have the data in CREODIAS, but I am not sure what assistance and access do you need. Do you want an account in CREODIAS or information about how to download the data? Please give me a clue

@jkonarski the intent is to download the data to INCD so that it is available there for processing using the OpenEO backend. What does INCD need to do to make that happen? cc @zbenta

zbenta commented 2 years ago

Some final results:

(eodag) [root@eodag ~]# python3 copernicus_data.py 
Nr of products 12
Fetching archival status: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.0/12.0 [00:02<00:00, 4.61product/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T30STH_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 843M/843M [14:11<00:00, 991kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T30SUG_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 608M/608M [15:46<00:00, 643kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T29SQA_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 752M/752M [15:56<00:00, 787kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T30STF_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 812M/812M [18:02<00:00, 750kB/s]
LTA retrieval: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [18:06<00:00, 1.09ks/product]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T29SPA_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 560M/560M [17:27<00:00, 535kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T29SPB_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 785M/785M [25:03<00:00, 522kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T30SUH_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 791M/791M [25:14<00:00, 522kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T29SQB_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 845M/845M [27:09<00:00, 519kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T30SUF_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 361M/361M [13:00<00:00, 463kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T30STG_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 844M/844M [27:19<00:00, 515kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T29SPC_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 688M/688M [24:33<00:00, 467kB/s]
Downloading S2B_MSIL1C_20180102T111439_N0206_R137_T29SQC_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 842M/842M [25:38<00:00, 547kB/s]
Downloading products: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.0/12.0 [1:08:55<00:00, 345s/product]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30STF_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:02<00:00, 46.24file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30SUG_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:01<00:00, 64.45file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30SUH_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:02<00:00, 45.59file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30SUF_20180102T131121.zip: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:01<00:00, 105.79file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30STH_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:02<00:00, 41.57file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30STG_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:02<00:00, 41.61file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SQC_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:02<00:00, 48.79file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SQB_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:02<00:00, 40.20file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SQA_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:02<00:00, 46.91file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SPB_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:02<00:00, 42.06file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SPC_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:02<00:00, 46.11file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SPA_20180102T131121.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:01<00:00, 60.68file/s]
Downloaded Data: ['/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30STF_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T30STF_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30SUG_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T30SUG_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30SUH_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T30SUH_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30SUF_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T30SUF_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30STH_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T30STH_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30STG_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T30STG_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SQC_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T29SQC_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SQB_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T29SQB_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SQA_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T29SQA_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SPB_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T29SPB_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SPC_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T29SPC_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SPA_20180102T131121/S2B_MSIL1C_20180102T111439_N0206_R137_T29SPA_20180102T131121.SAFE']
jkonarski commented 2 years ago

We have 3 modes to access the EO data in CREODIAS repository:

  1. Directly from the cloud with no limitation, the only cost is the cost of IaaS / VMs.
  2. Through the EO Finder - tool for searching and downloading the EO products. Products are available for free, downloading is enabled for registered users. The limitation of this solution is that each product needs to be downloaded separately, so it's suitable for a smaller scale project or test-bed.
  3. Through the s3 port, there is some cost related to data transfer.

To access the Finder you can go to the following link: https://finder.creodias.eu/

backeb commented 2 years ago

We have 3 modes to access the EO data in CREODIAS repository:

  1. Directly from the cloud with no limitation, the only cost is the cost of IaaS / VMs.
  2. Through the EO Finder - tool for searching and downloading the EO products. Products are available for free, downloading is enabled for registered users. The limitation of this solution is that each product needs to be downloaded separately, so it's suitable for a smaller scale project or test-bed.
  3. Through the s3 port, there is some cost related to data transfer.

To access the Finder you can go to the following link: https://finder.creodias.eu/

Thanks @jkonarski

We need to check how to pay for costs here using the project's VA allocation for CloudFerro.

zbenta commented 2 years ago

We have already applied for free credit at creodias. Since I'll be on vacation next week, @tiagofglip and @miguelviana95 will do some more tests with creodias.

tiagofglip commented 2 years ago

Hi, We already tested with creodias provider with an account created by us and it works nicely and much faster. You can see the results below. With creodias we downloaded the exactly the same data in 11 minutes, in contrast with 1h:08 from scihub.

(eodag) [root@eodag ~]# python copernicus_data.py 
Nr of products 12
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30SUG_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 95.91file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30STG_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 61.12file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SPA_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 99.85file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SPC_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 82.21file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SQA_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 73.73file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30STF_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 73.72file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30STH_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 63.94file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30SUH_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 70.28file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T30SUF_20180102T131121.SAFE.zip: 100%|████| 116/116 [00:00<00:00, 160.07file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SPB_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 75.73file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SQB_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 77.91file/s]
Extracting files from S2B_MSIL1C_20180102T111439_N0206_R137_T29SQC_20180102T131121.SAFE.zip: 100%|█████| 116/116 [00:01<00:00, 65.35file/s]
Downloaded products: 100%|████████████████████████████████████████████████████████████████████████████| 12/12 [11:02<00:00, 55.22s/product]
Downloaded Data: ['/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30SUG_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30STG_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SPA_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SPC_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SQA_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30STF_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30STH_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30SUH_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T30SUF_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SPB_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SQB_20180102T131121.SAFE', '/tmp/S2B_MSIL1C_20180102T111439_N0206_R137_T29SQC_20180102T131121.SAFE']
jdries commented 2 years ago

That also corresponds with our experience. @jkonarski I think this is using option 2 right? I do not yet understand the limitation fully, we are indeed downloading each product separately, in what sense is that different or more limiting compared to using the s3 port? (Because it seems to work quite well.)

tiagofglip commented 2 years ago

Just to clarify, we don't need credits at creodias to download the data, it's apparently free since you have login credentials, but like you said it is downloaded sequentially.

backeb commented 2 years ago

We have 3 modes to access the EO data in CREODIAS repository:

  1. Directly from the cloud with no limitation, the only cost is the cost of IaaS / VMs.
  2. Through the EO Finder - tool for searching and downloading the EO products. Products are available for free, downloading is enabled for registered users. The limitation of this solution is that each product needs to be downloaded separately, so it's suitable for a smaller scale project or test-bed.
  3. Through the s3 port, there is some cost related to data transfer.

To access the Finder you can go to the following link: https://finder.creodias.eu/

@tiagofglip @zbenta @jdries @mariojmdavid I just had a discussion with @MZICloudferro from CloudFerro, and he indicated the following regarding option 2 and 3:

Option 2

Option 3

I propose moving ahead with Option 2 for now. @Jaapel / @gena provide bonding box and period for Sentinel-2 L1C data needed, so that INCD can make sure the data is ready for testing during next week's sprint.

cc @sustr4 @jkonarski

Jaapel commented 2 years ago

Yes, we provided it by mail: https://code.earthengine.google.com/95e289c079b295acef633837d261788d. date range: start: 2018-01-01, stop: 2021-01-01 bbox_json: { "geodesic": false, "type": "Polygon", "coordinates": [ [ [ -7.682155041704681, 36.18203953636458 ], [ -5.083888440142181, 36.18203953636458 ], [ -5.083888440142181, 38.620982842287496 ], [ -7.682155041704681, 38.620982842287496 ], [ -7.682155041704681, 36.18203953636458 ] ] ] }

tiagofglip commented 2 years ago

Hi @backeb Thank you for the straightforward instructions, by now we can proceed with option 2 then.

@Jaapel we will change the script with your time window and coordinates. When the data become available we tell.

Jaapel commented 2 years ago

Hi @tiagofglip , thanks for the effort! Let us know if there are difficulties.

tiagofglip commented 2 years ago

Hello, We already download the files. They are in the nfs volume that is mounted in the containers in /opt/spark/work-dir/data_sets. Is this the correct place for the data? If not, just tell us.

Jaapel commented 2 years ago

HI @tiagofglip I cannot see the dataset within OpenEO by using openeo.rest.connect.list_collections() documented here. @jdries what steps need to be performed to register the data with the OpenEO Pyspark Backend?

jdries commented 2 years ago

We'll need to configure a layer in layercatalog.json, for this, we'll need a 'glob' that points to the data, something like: "/data/MTDA/AgERA5///AgERA5dewpoint-temperature*.tif" In this case I assume it will be jpeg2000 files, not tiff, so we'll need to make sure that they are read correctly. Then we'll also need a regex that parses the date from the filename, like '".+_(\d{4})(\d{2})(\d{2})\.tif",'..

I would suggest to copy paste the SENTINEL2_L1C config from this file: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker/creo_layercatalog.json and then we'll need to update the settings so that data is actually found.

The alternative option was to have a STAC catalog provided by this eodag tool, but I guess that's perhaps not entirely trivial either?

zbenta commented 2 years ago

Where should we place the catalog file? I can view the available catalogs in our instance by going to the endpoint url, but I'm unable to find the catalog file within the pods. @jdries, do you have any ideas where it is located? We could edit it locally and test the access to the data that was download.

jdries commented 2 years ago

I believe it's added to the docker image: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/docker/Debian/Dockerfile#L31 Not sure if there's an easy way to override that, perhaps creating a new image that starts from ours?

tiagofglip commented 2 years ago

But we have to change the catalog layercatalog.json by hand?

jdries commented 2 years ago

I can also do it for you, I will only need the glob path and date regex. Or can I ssh into your machines to inspect the NFS volume myself?

jdries commented 2 years ago

Maybe quick update here: I've got access and will prepare a layer config. That will allow us to test if jpeg2000 files will also be picked up correctly (like geotiff).

Jaapel commented 2 years ago

Hi @jdries how are you doing? Is there something we can help with?

jdries commented 2 years ago

Unfortunately not, I'm just working through my open questions and hope to get to this one asap. Maybe just in terms of planning, how much are you 'stuck' on this, given that you in any case can test the use case end-to-end on the Terrascope backend?

jdries commented 2 years ago

I was able to analyze a bit further. So the current codepath for discovering data on disk based on glob patterns does not yet support jpeg2000. Next to that, it is also less advanced then the catalog based codepath. So a few options:

  1. hack jpeg2000 support into the globbing based codepath.
  2. gdal_translate the jpeg2000 files into COGs (geotiff), to make this dataset work without too many code changes
  3. set up a STAC catalog that exposes the jp2 files, allowing us to use the more well tested and performant code paths
  4. I implement support for static STAC catalogs. This would be a compromise between the limited things we can do with globbing stuff, and having to set up a full catalog. Would require some work on our side, so also some patience :-).

So unless there is really a burning need to have something on the very short term, I would really go for number 3 and maybe 4. In fact, with this eo-dag tool, setting up this catalog might not even be that hard.

backeb commented 2 years ago

Sprint 1: Retro

Top: What worked well?

Tip: What to improve...

Sprint progress

https://confluence.egi.eu/display/CSCALE/2021-09-28+Aquamonitor+sprint+planning+meeting

Switch to INCD OpenEO backend (requires that data is available on INCD)

Objective for next sprint

Will be determined at planning meeting

Date of next sprint

Will be determined at planning meeting

backeb commented 2 years ago

Sprint 2 planning

Discussion

Understand the effort, roles and responsibilities associated with implementing the OpenEO architecture options (https://github.com/c-scale-community/use-case-aquamonitor/issues/19#issuecomment-948345548)

Sprint 2 Objectives

Data objectives for Aquamonitor:

Sprint activities

Sprint dates

15-19 November