c-scale-community / use-case-aquamonitor

Apache License 2.0
2 stars 1 forks source link

Sprint 3: 31 Jan - 4 Feb #23

Closed backeb closed 2 years ago

backeb commented 2 years ago

Notes from sprint planning meeting can be found here: https://confluence.egi.eu/display/CSCALE/2022-01-20+Planning+the+next+Aquamonitor+sprint

Objectives

The overarching goal is to work towards running the Aquamonitor workflow on

  1. INCD compute, accessing data available at INCD
  2. INCD compute, accessing data remotely on CREODIAS Report on performance differences.

Make the data available for the use case

Progress on Notebook (MVP)

cc @gena @Jaapel @gdonvito @mariojmdavid @jopina @jorge-lip

jdries commented 2 years ago

@Jaapel I finally was able to add an experimental feature to our backend so that overviews are used when you work at lower resolutions. This is a very basic example, note this line where I set to 'experimental' feature flag, which is important to make it work, all against openeo-dev.vito.be:

    from openeo.processes import lte, eq
    rgb = connection.load_collection("TERRASCOPE_S2_TOC_V2",
            spatial_extent={'west':3.758216409030558,'east':4.087806252,'south':51.291835566,'north':51.3927399},
            temporal_extent=["2020-03-11","2020-03-15"],bands=['B04'],properties={
            "eo:cloud_cover": lambda cc:eq(cc, 50 )
        } )

    rgb._pg.arguments['featureflags'] = {"experimental": True}
    #specify process graph
    download = rgb.min_time().resample_spatial(resolution=80,projection=3857).download("/tmp/openeo-rgb-sen2cor-manyclouds-resampled.tiff")

Can you integrate this in your code and do a test run on a larger scale?

By the way, can you confirm that the full processing of spain will also work against a lower resolution? This is quite important for getting a view on data needs.

Jaapel commented 2 years ago

@jdries I can try it tomorrow, worked today on an example with all the data using the .resample method.

Do you know how both resampling and this new experimental feature work with masks or missing data? When upsampling, do NaN values affect the result?

Jaapel commented 2 years ago

Also caching DataCubes causes missing metadata errors, as described here, which makes quick iteration on larger datasets difficult. Todays run Took ~4 hours to complete. If you have some time this sprint, I can guide you through how I set it up!

jdries commented 2 years ago

@Jaapel upsampling can indeed have various approaches to NaN values, but when we speed things up by using the overviews in the native products, we can't control that anymore. Also for the sceneclassification, I don't really know what was to generate them. Will be interesting to compare results perhaps.

We clearly need to work on this load_result to simpllify the caching, but this experimental use of overviews also has the potential to drastically reduce that 4 hours job duration.

Jaapel commented 2 years ago

@jdries any place where I can find information about how resampling / upsampling method work with NaN / filtered values?

jdries commented 2 years ago

It seems that both openEO and GDAL explicitly mention how NODATA/valid pixels are treated, per resampling method: https://gdal.org/programs/gdalwarp.html#cmdoption-gdalwarp-r https://processes.openeo.org/#resample_spatial

I have been searching through Sentinel-2 docs, but unfortunately cannot find which resampling method is used to generate overviews.

Jaapel commented 2 years ago

This is great @jdries ! Let me try to see if I can improve the masking in the algorithm.

backeb commented 2 years ago

Retrospective

Tops

Tips

Review objectives

The overarching goal is to work towards running the Aquamonitor workflow on

  1. INCD compute, accessing data available at INCD
  2. INCD compute, accessing data remotely on CREODIAS Report on performance differences.

Make the data available for the use case

Progress on Notebook (MVP)

Current objective = make the data available

other actions

follow up progress meeting

18 Feb 15h00 CET

zbenta commented 2 years ago

We have recreated the stac server to use an nfs moutend PVC. We have also have recreated the spark-executor/driver to also mount the same nfs enabled PVC to the /opt/workdir/ path. The python script we created to download the data is running on the stac server and is currencly downloading the data into the said nfs enabled PVC. We believe that this solution is the best one to allow the spark-executor/driver access to the downloaded products. The remaining work to be done is to somehow enable access to the data inside the spark-executer/driver pod as an existing collection, so that @Jaapel can use his jupyter notebook for processing the data.

backeb commented 2 years ago

Progress meeting: 18 Feb

provide object storage (swift/S3 interfaces), check integration with EGI Checkin

Make data available on INCD

Update https://github.com/c-scale-community/use-case-aquamonitor/issues/23#issuecomment-1044437607

Next steps

sort out the VA amendment for CREODIAS to get access to object storage

configure CREODIAS layer for INCD instance so that the OpenEO workflow can access the data remotely from INCD

in terms of optimisation can work with lower resolution data, has performance impacts for transfer, analysis etc.

Continue testing and improving Notebook using data for whole of Spain on Terrascope

Switch to CREODIAS backend and test.

See above dependency

Switch to INCD backend and test local data access performance.

See above dependency

Next meeting

9 March 4-5pm

backeb commented 2 years ago

Hi all,

I have been able to test the remote S3 access to CreoDIAS in openEO, and gotten it to work.

The main next step is for INCD to get an S3 access key and secret key for use in the use case, but I guess we need to wait for the amendment to the VA?

After that, to go further: INCD (Zacarias) will have to update openEO to latest versions. Quite a lot has changed since we did the initial deploy, and I still needed a small change to get it working. Then we'll need to add a few environment variables for the connection to CreoDIAS: AWS_S3_ENDPOINT: "s3.cloudferro.com" AWS_DIRECT: "TRUE" AWS_ACCESS_KEY_ID: "THE KEY ID" AWS_SECRET_ACCESS_KEY: "SECRET" AWS_DEFAULT_REGION: "RegionOne" AWS_REGION: "RegionOne" AWS_HTTPS: "YES" AWS_VIRTUAL_HOSTING: "FALSE"

This will need to happen in a yaml file similar to this one: https://github.com/Open-EO/openeo-geotrellis-kubernetes/blob/master/kubernetes/openeo.yaml

After that, we should be able to use layers from CreoDIAS on INCD.

best regards, Jeroen

backeb commented 2 years ago

provide object storage (swift/S3 interfaces), check integration with EGI Checkin

poster for Portugal Copernicus meeting (first national copernicus conferences) on 22/23 March

Make data available on INCD

sort out the VA amendment for CREODIAS to get access to object storage

blockers

configure CREODIAS layer for INCD instance so that the OpenEO workflow can access the data remotely from INCD

in terms of optimisation can work with lower resolution data, has performance impacts for transfer, analysis etc. & Continue testing and improving Notebook using data for whole of Spain on Terrascope

reminder of objectives

  1. Switch to CREODIAS backend and test remote access from OpenEO.
  2. Switch to INCD backend and test local data access performance.
  3. report on performance differences

next steps

Follow up meeting: 25 March, 12h00 CET

sustr4 commented 2 years ago

Requires an indexing job to be run locally to index data at provider and create the STAC metadata

I know we don't get a fresh start, but shouldn't the data be registered in STAC by the downloader? I guess it knows what it just downloaded, right? We can think of a one-time solution to register what has already been downloaded before, but that would be a one-time hack.

please update on progress related to centralised STAC catalogue service

User/Access management seems to be the greatest issue now. Would it be possible to set up an IP filter before we have proper access control? That would mean someone (INCD?) specifying IP addresses (ranges) that can access the catalogue. Just asking: It may not be needed in the end.

mariojmdavid commented 2 years ago

hi Zdenek

for INCD that would be 194.210.120.0/23

best

Mario

On 09/03/22 15:52, Zdeněk Šustr wrote:

Requires an indexing job to be run locally to index data at
provider and create the STAC metadata

I know we don't get a fresh start, but shouldn't the data be registered in STAC by the downloader? I guess it knows what it just downloaded, right? We can think of a one-time solution to register what has already been downloaded before, but that would be a one-time hack.

please update on progress related to centralised STAC catalogue
service

User/Access management seems to be the greatest issue now. Would it be possible to set up an IP filter before we have proper access control? That would mean someone (INCD?) specifying IP addresses (ranges) that can access the catalogue. Just asking: It may not be needed in the end.

— Reply to this email directly, view it on GitHub https://github.com/c-scale-community/use-case-aquamonitor/issues/23#issuecomment-1063069819, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRFFWA4EUIZTOC7EDUTRDLU7DCKLANCNFSM5MU4TQEQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

sebastian-luna-valero commented 2 years ago

Should we close this one?

backeb commented 2 years ago

Yes!


From: Sebastian Luna-Valero @.> Sent: Tuesday, October 25, 2022 5:24:29 PM To: c-scale-community/use-case-aquamonitor @.> Cc: Björn Backeberg @.>; Mention @.> Subject: Re: [c-scale-community/use-case-aquamonitor] Sprint 3: 31 Jan - 4 Feb (Issue #23)

Caution: This message was sent from outside of Deltares. Please do not click links or open attachments unless you recognize the source of this email and know the content is safe. Please report all suspicious emails to @.***" as an attachment.

Should we close this one?

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fc-scale-community%2Fuse-case-aquamonitor%2Fissues%2F23%23issuecomment-1290743581&data=05%7C01%7C%7C70041f25c0d64e17aee908dab69d02d2%7C15f3fe0ed7124981bc7cfe949af215bb%7C0%7C0%7C638023082723015134%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LTAAW1B6IDBK15VP49611C4%2BNdzK8nQ84quSBkwtNNY%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAHAI3NXWWM6TUCWATR63J33WE73S3ANCNFSM5MU4TQEQ&data=05%7C01%7C%7C70041f25c0d64e17aee908dab69d02d2%7C15f3fe0ed7124981bc7cfe949af215bb%7C0%7C0%7C638023082723015134%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aPH%2BFoAYA0Y%2FyZreRFVcVvXDtIgTgy%2BT7Lfz%2FxuBxY0%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

DISCLAIMER: This message is intended exclusively for the addressee(s) and may contain confidential and privileged information. If you are not the intended recipient please notify the sender immediately and destroy this message. Unauthorized use, disclosure or copying of this message is strictly prohibited. The foundation 'Stichting Deltares', which has its seat at Delft, The Netherlands, Commercial Registration Number 41146461, is not liable in any way whatsoever for consequences and/or damages resulting from the improper, incomplete and untimely dispatch, receipt and/or content of this e-mail.

sebastian-luna-valero commented 2 years ago

Thanks!