CS-SI / eodag

Earth Observation Data Access Gateway
https://eodag.readthedocs.io
Apache License 2.0
324 stars 44 forks source link

Token mechanism not handled for Copernicus Dataspace provider #858

Closed chiarch84 closed 11 months ago

chiarch84 commented 1 year ago

Describe the bug Copernicus Dataspace token refresh is not handled by EODAG, so for massive downloads it is not possible to use EODAG as plugin since it will lead to Authentication Errors and afterwards to be temporarily banned by the provider due to too many sessions opened at the same time (1 token = 1 session). For Data Space every new token request is considered a new Session and so, if the previous token had not expired this means that 2 parallel sessions are open. Leading very soon to too many not allowed Sessions at the same time. https://documentation.dataspace.copernicus.eu/Quotas.html#copernicus-general-users This is the link to the max quotas for general users.

We (JRC - EC) have the quotas for Service providers, but we reached them anyhow since the refresh token is not handled.

We are using the single download since for how our workflow is created we singularly download each product, but it seems that for the same authenticated user new tokens have to be generated all the time, instead than using the existing ones and just refreshing them. The mechanism should be to, in case an active token already exists, to refresh it every 10 minutes up to 60 minutes time. At 60 minutes time the token is not valid anymore and the session expires so a new token has to be requested. Since there is no "logout" mechanism it is also not possible to kill the existing sessions ourselves.

We need to understand if EODAG is going to implement this or not and with what priority, since if not we might have to rewrite the code for downloading ourselves and not use EODAG for downloading from Dataspace. Thanks for your answer.

Code To Reproduce

try:
        product_scratch_path = pathlib.Path(
            dag.download(eodag_eoproduct, outputs_prefix=scratch_dir)
        )
except AuthenticationError as e:
    logger.exception(e)
    dag = login_to_eodag_provider(provider=provider_name)
    process_single_eodag_product(
        product_uuid,
        base_path,
        collection_name,
        metadata,
        product_type,
        elastic,
        provider_name,
        eodag_eoproduct,
        dag,
        )

Environment:

Python version: 3.9.5 EODAG version: 2.9.1

sbrunato commented 1 year ago

Hello @chiarch84 and thanks for this issue. Yes this is something that should be handled, we'll plan to update the token retrieval mechanism to support it

chiarch84 commented 1 year ago

Thanks a lot for your answer @sbrunato ! Keep us updated on the roadmap!

koleckt commented 1 year ago

Hello @chiarch84 , Waiting for the EODAG upgrade, did you find (or develop) a homemade downloading function for COP_DATASPACE ?

chiarch84 commented 1 year ago

Dear @koleckt no for the moment we just set up our downloads rate so that we don't exceed a certain number of "invalid user credentials" messages per minute and hour, so that we do not get "banned". So we keep downloading while receiving errors. Since we were hoping that EODAG would solve it soon. I guess we are not the only ones expecting this feature since CDSE is now the new official provider of Copernicus data.

gasparakos commented 1 year ago

Hello,

Based on https://documentation.dataspace.copernicus.eu/APIs/OData.html#product-download , I changed the keycloak authentication code a bit: https://github.com/gasparakos/eodag/blob/develop/eodag/plugins/authentication/keycloak.py#L94

Our downloads runs stable. It stopped two times. Once at 3 a.m. when summer-winter clock change has taken place and clock went back to 2 a.m.. And once when session expiration (10 hour) happened. I adjusted the code a bit and run again. I hope it can start new session after 10 hour.

I put it in https://github.com/gasparakos/eodag. You can try it with: pip install --upgrade git+https://github.com/gasparakos/eodag.git@develop

gasparakos commented 1 year ago

Our downloads runs stable. It stopped two times. Once at 3 a.m. when summer-winter clock change has taken place and clock went back to 2 a.m.. And once when session expiration (10 hour) happened. I adjusted the code a bit and run again. I hope it can start new session after 10 hour.

l left a breakpoint() in eodag/plugins/authentication/keycloak.py(117)authenticate() try except block, and got another response. After continue, new session started, and download continued.

gasparakos commented 1 year ago

I hope it can start new session after 10 hour.

Sorry, it can not handle session expiration. Maybe some time based explicit renewal.

gasparakos commented 1 year ago

Sorry, it can not handle session expiration. Maybe some time based explicit renewal.

This problem was something else. Script was restarted and images still downloading from Tuesday. Session (10h expiration) was renewed a few times until now.