backeb commented 2 years ago

Notes from sprint planning meeting can be found here: https://confluence.egi.eu/display/CSCALE/2022-01-20+Planning+the+next+Aquamonitor+sprint

Objectives

The overarching goal is to work towards running the Aquamonitor workflow on

INCD compute, accessing data available at INCD
INCD compute, accessing data remotely on CREODIAS Report on performance differences.

Make the data available for the use case

[ ] @jdries @MZICloudferro arrange access to CREODIAS object storage, ensure that cost can be reimbursed from CREODIAS VA allocation
[ ] @jdries configure CREODIAS layer for INCD instance so that the OpenEO workflow can access the data remotely from INCD
[x] @zbenta @tiagofglip raise issue on EODAG Github re download and unzip issue
[x] @zbenta @tiagofglip @mariojmdavid provide object storage (swift/S3 interfaces), check integration with EGI Checkin.

Progress on Notebook (MVP)

[ ] @Jaapel Continue testing and improving Notebook using data for whole of Spain on Terrascope
[ ] @Jaapel Switch to CREODIAS backend and test.
[ ] @Jaapel Switch to INCD backend and test local data access performance.

cc @gena @Jaapel @gdonvito @mariojmdavid @jopina @jorge-lip

jdries commented 2 years ago

@Jaapel I finally was able to add an experimental feature to our backend so that overviews are used when you work at lower resolutions. This is a very basic example, note this line where I set to 'experimental' feature flag, which is important to make it work, all against openeo-dev.vito.be:

    from openeo.processes import lte, eq
    rgb = connection.load_collection("TERRASCOPE_S2_TOC_V2",
            spatial_extent={'west':3.758216409030558,'east':4.087806252,'south':51.291835566,'north':51.3927399},
            temporal_extent=["2020-03-11","2020-03-15"],bands=['B04'],properties={
            "eo:cloud_cover": lambda cc:eq(cc, 50 )
        } )

    rgb._pg.arguments['featureflags'] = {"experimental": True}
    #specify process graph
    download = rgb.min_time().resample_spatial(resolution=80,projection=3857).download("/tmp/openeo-rgb-sen2cor-manyclouds-resampled.tiff")

Can you integrate this in your code and do a test run on a larger scale?

By the way, can you confirm that the full processing of spain will also work against a lower resolution? This is quite important for getting a view on data needs.

Jaapel commented 2 years ago

@jdries I can try it tomorrow, worked today on an example with all the data using the .resample method.

Do you know how both resampling and this new experimental feature work with masks or missing data? When upsampling, do NaN values affect the result?

Jaapel commented 2 years ago

Also caching DataCubes causes missing metadata errors, as described here, which makes quick iteration on larger datasets difficult. Todays run Took ~4 hours to complete. If you have some time this sprint, I can guide you through how I set it up!

jdries commented 2 years ago

@Jaapel upsampling can indeed have various approaches to NaN values, but when we speed things up by using the overviews in the native products, we can't control that anymore. Also for the sceneclassification, I don't really know what was to generate them. Will be interesting to compare results perhaps.

We clearly need to work on this load_result to simpllify the caching, but this experimental use of overviews also has the potential to drastically reduce that 4 hours job duration.

Jaapel commented 2 years ago

@jdries any place where I can find information about how resampling / upsampling method work with NaN / filtered values?

jdries commented 2 years ago

It seems that both openEO and GDAL explicitly mention how NODATA/valid pixels are treated, per resampling method: https://gdal.org/programs/gdalwarp.html#cmdoption-gdalwarp-r https://processes.openeo.org/#resample_spatial

I have been searching through Sentinel-2 docs, but unfortunately cannot find which resampling method is used to generate overviews.

Jaapel commented 2 years ago

This is great @jdries ! Let me try to see if I can improve the masking in the algorithm.

backeb commented 2 years ago

Retrospective

Tops

@Jaapel connected to OpenEO discourse page (https://discuss.eodc.eu/) - helps with troubleshooting

Tips

...

Review objectives

The overarching goal is to work towards running the Aquamonitor workflow on

INCD compute, accessing data available at INCD
INCD compute, accessing data remotely on CREODIAS Report on performance differences.

Make the data available for the use case

provide object storage (swift/S3 interfaces), check integration with EGI Checkin
- didn't have time test it yet
- in principle should work
- need access token to authenticate via check-in with the S3/swift endpoint
- [ ] check how to do this from python with EGI Check-in etc
raise issue on EODAG Github re download and unzip issue
- response: https://github.com/CS-SI/eodag/issues/391
- EODAG STAC implementation not ideally standardised
- Data providers are going to standardise around STAC
  - can offer VITO catalogue as alternative (has STAC and open search API)
CREODIAS has files unzipped on object storage
- then for downloads and transfers over http they zip them
- ❗ If we can sort out the VA amendment for CREODIAS then would be much easier to get the data
arrange access to CREODIAS object storage, ensure that cost can be reimbursed from CREODIAS VA allocation
- meeting took place and planning to test
configure CREODIAS layer for INCD instance so that the OpenEO workflow can access the data remotely from INCD
- in progress
in terms of optimisation can work with lower resolution data, has performance impacts for transfer, analysis etc.
- outstanding questions about resampling
- [ ] @Jaapel consider mailing Copernicus helpdesk

Progress on Notebook (MVP)

Continue testing and improving Notebook using data for whole of Spain on Terrascope
- have results for half of spain
- updated visualisation
- notebook in good state to test full dataset - main analysis ready for testing
- working on optimisation
Switch to CREODIAS backend and test.
- dependent on @jdries input
- [ ] @Jaapel follow up with @jdries
Switch to INCD backend and test local data access performance.
- dependent on data available on INCD

Current objective = make the data available

[x] @zbenta follow up with EODAG people and figure out how long it would take to implement solution (https://github.com/CS-SI/eodag/issues/391)
[x] @zbenta if EODAG too long unzip ourselves, e.g. python script / cronjob to extract automatically or work with @jdries to implement VITO catalogue
[ ] @backeb follow with @cchatzikyriakou about VA amendment for CREODIAS - this would solve a bunch of issues and fast-track deployment of the use case

other actions

[ ] @all comment on https://confluence.egi.eu/display/CSCALE/C-SCALE+Data+Logistics Reflect on: As a user do I want to do everything myself?

follow up progress meeting

18 Feb 15h00 CET

[x] @backeb schedule

zbenta commented 2 years ago

We have recreated the stac server to use an nfs moutend PVC. We have also have recreated the spark-executor/driver to also mount the same nfs enabled PVC to the /opt/workdir/ path. The python script we created to download the data is running on the stac server and is currencly downloading the data into the said nfs enabled PVC. We believe that this solution is the best one to allow the spark-executor/driver access to the downloaded products. The remaining work to be done is to somehow enable access to the data inside the spark-executer/driver pod as an existing collection, so that @Jaapel can use his jupyter notebook for processing the data.

backeb commented 2 years ago

Progress meeting: 18 Feb

provide object storage (swift/S3 interfaces), check integration with EGI Checkin

[ ] Action for @mariojmdavid

Make data available on INCD

Update https://github.com/c-scale-community/use-case-aquamonitor/issues/23#issuecomment-1044437607

recreated the stac server to use an nfs moutend PVC.
recreated the spark-executor/driver to also mount the same nfs enabled PVC to the /opt/workdir/ path.
python script to download the data is running on the stac server and is currencly downloading the data into the said nfs enabled PVC.
We believe that this solution is the best one to allow the spark-executor/driver access to the downloaded products.
Downloading from CREODIAS provider - the free access service (sequential downloads of zip files)

Next steps

enable access to the data inside the spark-executer/driver pod as an existing collection, so that @Jaapel can use his jupyter notebook for processing the data.
- So far only data is downloaded - not connected to the STAC server
- Still have issues with the EODAG STAC server implementation (can download data but not extract zip files)
- Need STAC catalog to point OpenEO towards the data
  - Workaround discussed and dismissed as an option
- ❗ The bottleneck is not setting up the STAC catalog, but ingesting and indexing the data is difficult 🤯
- Options for STAC catalog to explore
  1. RESTO: https://github.com/jjrom/resto/blob/master/INSTALLATION.md
    - Deployment based on docker (easy to convert to K8s deployment)
    - RESTO will also need to index data
    - [ ] @sustr4 do you know how to ingest and index data in RESTO STAC catalog deployment?
    - [ ] @sustr4 check with CREODIAS who have a RESTO STAC catalog running in production
  2. Follow up with EODAG: https://github.com/CS-SI/eodag/issues/391
    - [ ] @zbenta carefully 😄 ask how long it might take for them to implement a solution for the unzipping
  3. Redeploy VITO catalog
    - [ ] @jdries to ask if possible and estimate effort required to make VITO catalog available

sort out the VA amendment for CREODIAS to get access to object storage

[x] @backeb to follow up with @cchatzikyriakou

configure CREODIAS layer for INCD instance so that the OpenEO workflow can access the data remotely from INCD

In progress
[ ] @jdries to update at next meeting

in terms of optimisation can work with lower resolution data, has performance impacts for transfer, analysis etc.

[ ] @Jaapel to check how resampling is done, i.e. how data is saved so we know at what scale we can pull data out.
Working on experimental approach developed by @jdries

Continue testing and improving Notebook using data for whole of Spain on Terrascope

2 outstanding issues to for @Jaapel to solve:
- [ ] 1. Caching data in background
- [ ] 2. Resampling and loading coarser resolution Sentinel-2 for optimisation

Switch to CREODIAS backend and test.

See above dependency

Switch to INCD backend and test local data access performance.

See above dependency

Next meeting

9 March 4-5pm

[ ] @backeb to schedule

backeb commented 2 years ago

Hi all,

I have been able to test the remote S3 access to CreoDIAS in openEO, and gotten it to work.

The main next step is for INCD to get an S3 access key and secret key for use in the use case, but I guess we need to wait for the amendment to the VA?

After that, to go further: INCD (Zacarias) will have to update openEO to latest versions. Quite a lot has changed since we did the initial deploy, and I still needed a small change to get it working. Then we'll need to add a few environment variables for the connection to CreoDIAS: AWS_S3_ENDPOINT: "s3.cloudferro.com" AWS_DIRECT: "TRUE" AWS_ACCESS_KEY_ID: "THE KEY ID" AWS_SECRET_ACCESS_KEY: "SECRET" AWS_DEFAULT_REGION: "RegionOne" AWS_REGION: "RegionOne" AWS_HTTPS: "YES" AWS_VIRTUAL_HOSTING: "FALSE"

This will need to happen in a yaml file similar to this one: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/kubernetes/openeo.yaml

After that, we should be able to use layers from CreoDIAS on INCD.

best regards, Jeroen

backeb commented 2 years ago

provide object storage (swift/S3 interfaces), check integration with EGI Checkin

more or less documented in EOSC Synergy
[ ] @mariojmdavid to try https://github.com/EOSC-synergy/documentation/tree/master/users/rclone-swift

poster for Portugal Copernicus meeting (first national copernicus conferences) on 22/23 March

@mariojmdavid preparing poster
aim to share 10/11 March for people to comment

Make data available on INCD

WP2 working on implementing centralised STAC catalogue where providers can register their data
How it works
- Central webservice
- REST interface allowing to add new collections and metadata
- Requires an indexing job to be run locally to index data at provider and create the STAC metadata
- Information sent to central STAC catalogue
- The STAC catalogue then points to data on provider
- @sustr4: CESNET and INFN still have hours available in WP3 - can they help with the development of "indexing script".
[ ] @jdries coordinate a brainstorming session about the indexing script
EODAG: https://github.com/CS-SI/eodag/issues/391
- issue changed to feature enhancement
- implementation time unknown

sort out the VA amendment for CREODIAS to get access to object storage

will be added to amendment but aren't sure
❗ is a blocker for credentials. Need project credentials for scaling.

blockers

centralised STAC catalogue service
project credentials from CloudFerro related to VA amendment
❗ ❗ cannot progress on use case with these blockers
[ ] @sustr4 please update on progress related to centralised STAC catalogue service
[ ] @cchatzikyriakou please advise on way forward re VA amendment and project credentials from CloudFerro
VA allocation system not particularly agile... maybe something to report on for the EC

configure CREODIAS layer for INCD instance so that the OpenEO workflow can access the data remotely from INCD

configured with own credentials and tested locally
need project credentials to scale - see above blocker

in terms of optimisation can work with lower resolution data, has performance impacts for transfer, analysis etc. & Continue testing and improving Notebook using data for whole of Spain on Terrascope

load_results() bug blocking progress: https://github.com/Open-EO/openeo-geopyspark-driver/issues/127
- https://github.com/Open-EO/openeo-geopyspark-driver/issues/126 - resolved.
[ ] @Jaapel work on resampling and loading coarser resolution Sentinel-2 for optimisation

reminder of objectives

Switch to CREODIAS backend and test remote access from OpenEO.
Switch to INCD backend and test local data access performance.
report on performance differences

next steps

[ ] @jdries coordinate meeting with WP2 about requirements for centralised STAC catalogue. include @mariojmdavid @zbenta ++
[ ] @backeb discuss how to get project credentials from CloudFerro in the mean time (so we don't have to wait for the VA amendment)

Follow up meeting: 25 March, 12h00 CET

sustr4 commented 2 years ago

Requires an indexing job to be run locally to index data at provider and create the STAC metadata

I know we don't get a fresh start, but shouldn't the data be registered in STAC by the downloader? I guess it knows what it just downloaded, right? We can think of a one-time solution to register what has already been downloaded before, but that would be a one-time hack.

please update on progress related to centralised STAC catalogue service

User/Access management seems to be the greatest issue now. Would it be possible to set up an IP filter before we have proper access control? That would mean someone (INCD?) specifying IP addresses (ranges) that can access the catalogue. Just asking: It may not be needed in the end.

mariojmdavid commented 2 years ago

hi Zdenek

for INCD that would be 194.210.120.0/23

best

Mario

On 09/03/22 15:52, Zdeněk Šustr wrote:

Requires an indexing job to be run locally to index data at
provider and create the STAC metadata
I know we don't get a fresh start, but shouldn't the data be registered in STAC by the downloader? I guess it knows what it just downloaded, right? We can think of a one-time solution to register what has already been downloaded before, but that would be a one-time hack.
please update on progress related to centralised STAC catalogue
service
User/Access management seems to be the greatest issue now. Would it be possible to set up an IP filter before we have proper access control? That would mean someone (INCD?) specifying IP addresses (ranges) that can access the catalogue. Just asking: It may not be needed in the end.

— Reply to this email directly, view it on GitHub https://github.com/c-scale-community/use-case-aquamonitor/issues/23#issuecomment-1063069819, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRFFWA4EUIZTOC7EDUTRDLU7DCKLANCNFSM5MU4TQEQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

sebastian-luna-valero commented 1 year ago

Should we close this one?

backeb commented 1 year ago

Yes!

From: Sebastian Luna-Valero @.> Sent: Tuesday, October 25, 2022 5:24:29 PM To: c-scale-community/use-case-aquamonitor @.> Cc: Björn Backeberg @.>; Mention @.> Subject: Re: [c-scale-community/use-case-aquamonitor] Sprint 3: 31 Jan - 4 Feb (Issue #23)

Caution: This message was sent from outside of Deltares. Please do not click links or open attachments unless you recognize the source of this email and know the content is safe. Please report all suspicious emails to @.***" as an attachment.

Should we close this one?

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fc-scale-community%2Fuse-case-aquamonitor%2Fissues%2F23%23issuecomment-1290743581&data=05%7C01%7C%7C70041f25c0d64e17aee908dab69d02d2%7C15f3fe0ed7124981bc7cfe949af215bb%7C0%7C0%7C638023082723015134%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LTAAW1B6IDBK15VP49611C4%2BNdzK8nQ84quSBkwtNNY%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAHAI3NXWWM6TUCWATR63J33WE73S3ANCNFSM5MU4TQEQ&data=05%7C01%7C%7C70041f25c0d64e17aee908dab69d02d2%7C15f3fe0ed7124981bc7cfe949af215bb%7C0%7C0%7C638023082723015134%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aPH%2BFoAYA0Y%2FyZreRFVcVvXDtIgTgy%2BT7Lfz%2FxuBxY0%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

DISCLAIMER: This message is intended exclusively for the addressee(s) and may contain confidential and privileged information. If you are not the intended recipient please notify the sender immediately and destroy this message. Unauthorized use, disclosure or copying of this message is strictly prohibited. The foundation 'Stichting Deltares', which has its seat at Delft, The Netherlands, Commercial Registration Number 41146461, is not liable in any way whatsoever for consequences and/or damages resulting from the improper, incomplete and untimely dispatch, receipt and/or content of this e-mail.

sebastian-luna-valero commented 1 year ago

Thanks!

c-scale-community / use-case-aquamonitor

Sprint 3: 31 Jan - 4 Feb #23

Objectives

Make the data available for the use case

Progress on Notebook (MVP)

Retrospective

Tops

Tips

Review objectives

Make the data available for the use case

Progress on Notebook (MVP)

Current objective = make the data available

other actions

follow up progress meeting

Progress meeting: 18 Feb

provide object storage (swift/S3 interfaces), check integration with EGI Checkin

Make data available on INCD

sort out the VA amendment for CREODIAS to get access to object storage

configure CREODIAS layer for INCD instance so that the OpenEO workflow can access the data remotely from INCD

in terms of optimisation can work with lower resolution data, has performance impacts for transfer, analysis etc.

Continue testing and improving Notebook using data for whole of Spain on Terrascope

Switch to CREODIAS backend and test.

Switch to INCD backend and test local data access performance.

Next meeting

provide object storage (swift/S3 interfaces), check integration with EGI Checkin

poster for Portugal Copernicus meeting (first national copernicus conferences) on 22/23 March

Make data available on INCD

sort out the VA amendment for CREODIAS to get access to object storage

blockers

configure CREODIAS layer for INCD instance so that the OpenEO workflow can access the data remotely from INCD

in terms of optimisation can work with lower resolution data, has performance impacts for transfer, analysis etc. & Continue testing and improving Notebook using data for whole of Spain on Terrascope

reminder of objectives

next steps