cognitedata / cognite-sdk-python

Cognite Python SDK
https://cognite-sdk-python.readthedocs-hosted.com/
Apache License 2.0
76 stars 27 forks source link

Timeout while iterating over a chunk #1862

Open Erivan3000 opened 1 month ago

Erivan3000 commented 1 month ago

System information (please complete the following information):

Describe the bug after just over 1 hour the token expires

A fictitious example, but one that reproduces my problem, is the following code:

for event_list in client.events(chunk_size = 500_000, data_set_ids = ids_sites): events = event_list.to_pandas()

Error message after just over an hour: CogniteAPIError: Unauthorized | code: 401 | X-Request-ID: 9c8702fc-59bf-9f7a-be47-4876d6b433f3

In fact, if I iterate in another way (for example, filtering by date), I can run code for more than 8 hours (for as long as I want, actually, because I re-authenticate between iterations, preventing the token from expiring). However, I don't want to filter by date because it doesn't provide consistent data volume like iterating by chunks.

And apparently, I can't authenticate between iterations in the example I provided here, as it seems the chunk persists the initial authentication, which initially makes sense. Does anyone know how to solve this?

To Reproduce Runnable code reproducing the error.

from cognite.client import CogniteClient

client = CogniteClient()

import pandas as pd

# DataFrame para acumular os resultados
all_events = pl.DataFrame({col: pl.Series([], dtype=dt) for col, dt in zip(events_columns, events_column_types)})

# Itera sobre os eventos retornados pelo cliente
for event_list in client.events(chunk_size = 250_000, data_set_ids = ids_sites):
    events = event_list.to_pandas()  # Converte os eventos para DataFrame
    events = pl.from_pandas(events)

    events = events.select(events_columns)

    all_events = pl.concat([all_events, events])  # Concatena com o DataFrame acumulado
    print(len(all_events))

Expected behavior I expected it to go through all the chunks without expiring the token

Screenshots image

image

image

Additional context Add any other context about the problem here.

haakonvt commented 1 month ago

Hi @Erivan3000 and thanks for the bug report. Could you share a code snippet showcasing how authenticating is set up for your CogniteClient?

Erivan3000 commented 1 month ago

Hi @Erivan3000 and thanks for the bug report. Could you share a code snippet showcasing how authenticating is set up for your CogniteClient?

of course, I put it now

haakonvt commented 2 weeks ago

@Erivan3000 From what I can tell, you pass in a single token as a string (that will eventually expire as you observe). You need to pass in a function that returns a valid token, or better, use one of the CredentialProviders that ship with the SDK for simplicity. These will refresh automatically in the background for you.

For instance, check out OAuthInteractive:

>>> from cognite.client.credentials import OAuthInteractive
>>> oauth_provider = OAuthInteractive(
...     authority_url="https://login.microsoftonline.com/xyz",
...     client_id="abcd",
...     scopes=["https://greenfield.cognitedata.com/.default"],
... )