FREVA-CLINT / freva-nextgen

The next generation Freva
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Stream data #46

Closed antarcticrainforest closed 3 months ago

antarcticrainforest commented 4 months ago

Feature: Introducing Zarr Data Streams and Enhanced Catalogue Creation

This PR introduces the functionality of our data search capabilities. With this update, users now can:

Authentication with Keycloak

To ensure secure access to Zarr endpoints, authentication is now implemented using Keycloak. This integration simplifies OAuth2 authentication, making it straightforward for users to authenticate and access data securely.

Additional Improvements

In addition to these new features, this update includes some refactoring and addresses performance issues related to Solr queries. Specifically, I've tried to optimize Solr queries to avoid deep pagination, which previously impacted performance when generating large Intake catalogues. Let's see if how this goes.

gitguardian[bot] commented 4 months ago

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them. Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately. Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

antarcticrainforest commented 3 months ago

I'm done with reviewing, there are two other general comments that I think it would be better to be considered:

1. as a minor comment currently logs look like:
❯ python run_server.py -c api_config.toml --debug --dev -p 7777 -f
Trying http://localhost:8080/realms/freva/.well-known/openid-configuration...200
INFO:     Will watch for changes in these directories: ['/Users/mo/dev/20240708/freva-nextgen']
INFO:     Loading environment from '/var/folders/99/_rg5v0pn0sq81l0rr5g0z20h0000gn/T/envjneumnzv.conf'
INFO:     Uvicorn running on http://0.0.0.0:7777 (Press CTRL+C to quit)
INFO:     Started reloader process [2401] using WatchFiles
INFO:     Started server process [2406]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - DEBUG: Loading cluster config from /var/folders/99/_rg5v0pn0sq81l0rr5g0z20h0000gn/T/freva-nextgen/data-portal-cluster-config.json
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - DEBUG: Starting data-loader process
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - DEBUG: recycling dask cluster.
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - INFO: Starting data-loading deamon
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - INFO: Broker will listen for messages now

Which uvicorn and freva-data-loder and freva-client logs are not in same format, which i think if it would be constructed in same format, to make the way of bug fixing in the future easier.

2. When we run server via `python run_server.py -c api_config.toml --debug --dev -p 7777 -f` and afterward in another terminal, we run the `python run_server.py` without any port dedication, first it stops the first server running perfectly but then we get the following in the later terminal:
❯ python run_server.py 
Trying http://localhost:8080/realms/freva/.well-known/openid-configuration...200

freva-nextgen on  stream-data via 🐳 desktop-linux via 🐍 v3.12.2 (venv) took 4s 
❯ INFO:     Loading environment from '/var/folders/99/_rg5v0pn0sq81l0rr5g0z20h0000gn/T/env2quej7hj.conf'
ERROR:    [Errno 48] Address already in use
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - DEBUG: Loading cluster config from /var/folders/99/_rg5v0pn0sq81l0rr5g0z20h0000gn/T/freva-nextgen/data-portal-cluster-config.json
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - DEBUG: Starting data-loader process
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - DEBUG: recycling dask cluster.
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - INFO: Starting data-loading deamon
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - INFO: Broker will listen for messages now

As it shows, if we get port already used and exit like FastiAPI-behavior, in data-loader and data-loader doesn't allow to go to Starting data-loader process level, i think it makes the structure more clear.

Also i didn't check typos, but as a suggestion i usually use type-precommit to avoid getting typo in my written docs. with high probability it reduces the large amount of typos Thanks for great job

I am not sure about the loggers. In practice, these loggers are independent, Because they run as different services on different machines. You will not even see them together, except for here in dev mode on your laptop.

As a personal preference, I think it is more clear to see these logging messages separated because then I can immediately see which message belongs to which service. So if you don't insist, I'd say we keep the logs as they are.

About the typ checker: what are you using for type checking?

antarcticrainforest commented 3 months ago

Ok thanks for the first round of reviews. Here is a TOD list:

mo-dkrz commented 3 months ago

I'm done with reviewing, there are two other general comments that I think it would be better to be considered:

1. as a minor comment currently logs look like:
❯ python run_server.py -c api_config.toml --debug --dev -p 7777 -f
Trying http://localhost:8080/realms/freva/.well-known/openid-configuration...200
INFO:     Will watch for changes in these directories: ['/Users/mo/dev/20240708/freva-nextgen']
INFO:     Loading environment from '/var/folders/99/_rg5v0pn0sq81l0rr5g0z20h0000gn/T/envjneumnzv.conf'
INFO:     Uvicorn running on http://0.0.0.0:7777 (Press CTRL+C to quit)
INFO:     Started reloader process [2401] using WatchFiles
INFO:     Started server process [2406]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - DEBUG: Loading cluster config from /var/folders/99/_rg5v0pn0sq81l0rr5g0z20h0000gn/T/freva-nextgen/data-portal-cluster-config.json
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - DEBUG: Starting data-loader process
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - DEBUG: recycling dask cluster.
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - INFO: Starting data-loading deamon
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:10:44 - INFO: Broker will listen for messages now

Which uvicorn and freva-data-loder and freva-client logs are not in same format, which i think if it would be constructed in same format, to make the way of bug fixing in the future easier.

2. When we run server via `python run_server.py -c api_config.toml --debug --dev -p 7777 -f` and afterward in another terminal, we run the `python run_server.py` without any port dedication, first it stops the first server running perfectly but then we get the following in the later terminal:
❯ python run_server.py 
Trying http://localhost:8080/realms/freva/.well-known/openid-configuration...200

freva-nextgen on  stream-data via 🐳 desktop-linux via 🐍 v3.12.2 (venv) took 4s 
❯ INFO:     Loading environment from '/var/folders/99/_rg5v0pn0sq81l0rr5g0z20h0000gn/T/env2quej7hj.conf'
ERROR:    [Errno 48] Address already in use
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - DEBUG: Loading cluster config from /var/folders/99/_rg5v0pn0sq81l0rr5g0z20h0000gn/T/freva-nextgen/data-portal-cluster-config.json
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - DEBUG: Starting data-loader process
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - DEBUG: recycling dask cluster.
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - INFO: Starting data-loading deamon
data-loader @ Mos-MacBook-Pro.d.dkrz.de - 2024-07-09T08:13:49 - INFO: Broker will listen for messages now

As it shows, if we get port already used and exit like FastiAPI-behavior, in data-loader and data-loader doesn't allow to go to Starting data-loader process level, i think it makes the structure more clear. Also i didn't check typos, but as a suggestion i usually use type-precommit to avoid getting typo in my written docs. with high probability it reduces the large amount of typos Thanks for great job

I am not sure about the loggers. In practice, these loggers are independent, Because they run as different services on different machines. You will not even see them together, except for here in dev mode on your laptop.

As a personal preference, I think it is more clear to see these logging messages separated because then I can immediately see which message belongs to which service. So if you don't insist, I'd say we keep the logs as they are.

About the typ checker: what are you using for type checking?

For typo you can use as we do in freva typo pre-commit: https://github.com/FREVA-CLINT/freva/blob/main/.pre-commit-config.yaml

antarcticrainforest commented 3 months ago

@MoSHad91, I've unified the loggers as much as possible. I hope the result is to your liking.

antarcticrainforest commented 3 months ago

The last commits are about adding the reload functionality for the data-loader. When the --dev mode is active the watchfiles lib takes care of reloading file changes. I've also merged the changes from the other PR.