bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform
https://birdhouse-deploy.readthedocs.io/en/latest/
Apache License 2.0
4 stars 6 forks source link

THREDDS: add more options to configure catalog.xml #472

Closed mishaschwartz closed 3 weeks ago

mishaschwartz commented 1 month ago

Overview

Changes

Non-breaking changes

Breaking changes

Related Issue / Discussion

Additional Information

CI Operations

birdhouse_daccs_configs_branch: master birdhouse_skip_ci: false

tlvu commented 1 month ago

Oh wait, can you do the same changes for optional-components/testthredds as well?

mishaschwartz commented 1 month ago

@tlvu

Oh wait, can you do the same changes for optional-components/testthredds as well?

Isn't this set up so that we can run tests against a different THREDDS server? If the tests don't require a different configuration why do we need to change this as well?

tlvu commented 1 month ago

Isn't this set up so that we can run tests against a different THREDDS server?

To test different version yes.

If the tests don't require a different configuration why do we need to change this as well?

I meant to allow the same customizations for the test. Currently we are testing Thredds v5 using this testthredds on our production host. By the same token we could at the same time test additional configs. So the same customizations would be useful.

So I meant to add TESTTHREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS and TESTTHREDDS_DATASET_DATASETSCAN_BODY.

mishaschwartz commented 1 month ago

@fmigneault

The THREDDS definitions need to align with Magpie definitions to protect the contents accordingly.

What was the intention for how the Magpie permissions were supposed to interact with additional catalogs introduced by setting the THREDDS_ADDITIONAL_CATALOG variable?

Is there a solution in place for that already or do I have to come up with a solution that accounts for arbitrary catalog definitions as well?

fmigneault commented 1 month ago

@mishaschwartz

For Magpie permissions, it does not really care about how the catalogs (the datasetScan blocks) are defined. It works only with the resolved URL paths. The "catalog" is the default "browsing" service that THREDDS uses when navigating its hierarchy, and the URL are all formed as /thredds/{service}/{nested-dirs...}. So, on Magpie side, the resources are defined as {thredds-service-type}/{nested-dirs-resources...}/{file-resource} (see the {service} is omitted). Adding more catalogs would only mean to reflect these new directories how they are resolved by URL under the Magpie THREDDS service.

The service combinations are defined directly on the Magpie THREDDS service configuration. Depending on the "prefixes" (i.e.: the service), it handles GET requests either as browse or read permissions. All service and file extensions that are classified as providing metadata are typically browse, and the actual data access are read. In some edge cases, some services can be both (eg: WMS and WCS that can describe or get the data based on request parameter), and are therefore placed in the read category.

mishaschwartz commented 1 month ago

@fmigneault

Thanks for the detailed explanation. I understand the Magpie configuration and how its "prefix" definitions relate to the URLs in THREDDS.

My concern is more about whether we need to be able to customize the "file_patterns" definitions in the Magpie configuration files to handle duplicate file extensions other than .nc and .ncml

The other concern is that users can define custom service definitions if they'd like other than the ones listed here:

https://github.com/bird-house/birdhouse-deploy/blob/3d7c8d64230c0798e4362239ff1064ef005ec681/birdhouse/components/thredds/catalog.xml.template#L6-L15

Or they could potentially modify the base attribute so that the URL path no longer matches the prefix defined in Magpie.

I propose we either do:

I'm working on a solution but if you have any insight into this issue let me know

fmigneault commented 1 month ago

If custom service types are added, they must be provided in the browse/read section accordingly for Magpie to grant/deny access to them as expected. Similarly, additional file patterns (or extensions) must also be provided.

If another location than /twitcher/ows/proxy/thredds/{service}/ is employed, it will not be managed by Magpie/Twitcher. However, if the service uses the same /twitcher/ows/proxy/thredds/ prefix (it should since it is under this THREDDS docker service), it will default to DENY access, unless "full-access" was granted on THREDDS service to an anonymous group. Therefore, I don't think modifying base is the right approach. (BTW, they should probably also inherit from TWITCHER_PROTECTED_PATH rather than hard-coded)

Rather than having THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS directly with the XML, maybe we should have a THREDDS_SERVICE_DATA_EXTRA_FILE_BROWSE_EXTENSIONS and THREDDS_SERVICE_DATA_EXTRA_FILE_READ_EXTENSIONS, and generate THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS from them? There are actually already some discrepancies (eg: missing .md, .rst, .csv in Magpie, but listed in THREDDS). A few warning/descriptions would help explain how to customize them.

mishaschwartz commented 1 month ago

After several iterations, I don't think that there is an easy way to get the flexibility we want by defining these variables and also enforce the Magpie settings as well. So the compromise I went with is to add some defaults and make the Magpie settings configurable as well so that they can be updated as needed. I also added some instructions/warnings about how to configure Magpie to match changes to THREDDS.

crim-jenkins-bot commented 1 month ago

E2E Test Results

DACCS-iac Pipeline Results

Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2845/
Result :white_check_mark: SUCCESS

BIRDHOUSE_DEPLOY_BRANCH : thredds-more-configuration
DACCS_IAC_BRANCH : master
DACCS_CONFIGS_BRANCH : master
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master

DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-216.rdext.crim.ca

PAVICS-e2e-workflow-tests Pipeline Results

Tests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1716/

NOTEBOOK TEST RESULTS
    
[2024-10-15T15:04:46.597Z] ============================= test session starts ==============================
[2024-10-15T15:04:46.597Z] platform linux -- Python 3.11.6, pytest-8.2.0, pluggy-1.5.0
[2024-10-15T15:04:46.597Z] rootdir: /home/jenkins/agent/workspace/PAVICS-e2e-workflow-tests_master
[2024-10-15T15:04:46.597Z] plugins: anyio-4.3.0, dash-2.17.0, nbval-0.11.0, tornasync-0.6.0.post2, xdist-3.5.0
[2024-10-15T15:04:46.597Z] collected 301 items
[2024-10-15T15:04:46.597Z] 
[2024-10-15T15:04:56.278Z] notebooks-auth/geoserver.ipynb ..................                        [  5%]
[2024-10-15T15:06:12.447Z] notebooks-auth/test_cowbird_jupyter.ipynb ..........                     [  9%]
[2024-10-15T15:06:12.709Z] notebooks-auth/test_thredds.ipynb ...........                            [ 12%]
[2024-10-15T15:06:59.509Z] pavics-sdi-master/docs/source/notebooks/CaSR_basic.ipynb ......          [ 14%]
[2024-10-15T15:07:09.383Z] pavics-sdi-master/docs/source/notebooks/WCS_example.ipynb .......        [ 17%]
[2024-10-15T15:07:18.554Z] pavics-sdi-master/docs/source/notebooks/WFS_example.ipynb ......         [ 19%]
[2024-10-15T15:14:37.025Z] pavics-sdi-master/docs/source/notebooks/climex.ipynb ............        [ 23%]
[2024-10-15T15:14:37.025Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-climate-stations.ipynb . [ 23%]
[2024-10-15T15:14:43.469Z] ...............                                                          [ 28%]
[2024-10-15T15:14:51.365Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-xclim.ipynb .....    [ 30%]
[2024-10-15T15:14:58.357Z] pavics-sdi-master/docs/source/notebooks/esgf-dap.ipynb .......           [ 32%]
[2024-10-15T15:15:13.478Z] pavics-sdi-master/docs/source/notebooks/forecasts.ipynb ......           [ 34%]
[2024-10-15T15:15:37.282Z] pavics-sdi-master/docs/source/notebooks/opendap.ipynb .......            [ 36%]
[2024-10-15T15:15:41.808Z] pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb .....       [ 38%]
[2024-10-15T15:20:10.909Z] pavics-sdi-master/docs/source/notebooks/regridding.ipynb ............... [ 43%]
[2024-10-15T15:21:20.639Z] .............                                                            [ 47%]
[2024-10-15T15:21:22.576Z] pavics-sdi-master/docs/source/notebooks/rendering.ipynb ....             [ 49%]
[2024-10-15T15:21:24.359Z] pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb ........ [ 51%]
[2024-10-15T15:21:39.983Z] .................                                                        [ 57%]
[2024-10-15T15:21:47.753Z] pavics-sdi-master/docs/source/notebooks/subsetting.ipynb ......          [ 59%]
[2024-10-15T15:21:49.137Z] pavics-sdi-master/docs/source/notebook-components/weaver_example.ipynb . [ 59%]
[2024-10-15T15:22:12.317Z] .........                                                                [ 62%]
[2024-10-15T15:22:21.423Z] finch-master/docs/source/notebooks/dap_subset.ipynb ...........          [ 66%]
[2024-10-15T15:22:30.278Z] finch-master/docs/source/notebooks/finch-usage.ipynb ......              [ 68%]
[2024-10-15T15:22:31.672Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-1DataAccess.ipynb . [ 68%]
[2024-10-15T15:22:34.736Z] .....                                                                    [ 70%]
[2024-10-15T15:22:49.854Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-2Subsetting.ipynb . [ 70%]
[2024-10-15T15:23:06.031Z] ............                                                             [ 74%]
[2024-10-15T15:23:20.940Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb . [ 75%]
[2024-10-15T15:23:43.719Z] .....s.                                                                  [ 77%]
[2024-10-15T15:23:51.883Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb . [ 77%]
[2024-10-15T15:24:08.192Z] ..                                                                       [ 78%]
[2024-10-15T15:24:16.345Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb . [ 78%]
[2024-10-15T15:25:14.907Z] .........                                                                [ 81%]
[2024-10-15T15:25:24.916Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-6Regridding_Conversion.ipynb . [ 82%]
[2024-10-15T15:30:01.878Z] ....                                                                     [ 83%]
[2024-10-15T15:30:01.878Z] PAVICS-landing-master/content/notebooks/hydrology/PAVICStutorial_Hydrology-01_Intro.ipynb . [ 83%]
[2024-10-15T15:30:01.878Z] ....                                                                     [ 85%]
[2024-10-15T15:30:05.180Z] PAVICS-landing-master/content/notebooks/hydrology/PAVICStutorial_Hydrology-02_Calibration.ipynb . [ 85%]
[2024-10-15T15:30:11.022Z] .....                                                                    [ 87%]
[2024-10-15T15:30:16.314Z] PAVICS-landing-master/content/notebooks/hydrology/PAVICStutorial_Hydrology-03_Watershed_properties.ipynb . [ 87%]
[2024-10-15T15:30:33.895Z] .............                                                            [ 91%]
[2024-10-15T15:30:38.115Z] PAVICS-landing-master/content/notebooks/hydrology/PAVICStutorial_Hydrology-04_Time_series_analysis.ipynb . [ 92%]
[2024-10-15T15:30:39.499Z] ......                                                                   [ 94%]
[2024-10-15T15:30:41.790Z] notebooks/hummingbird.ipynb ............                                 [ 98%]
[2024-10-15T15:33:15.937Z] notebooks/stress-tests.ipynb ......                                      [100%]
[2024-10-15T15:33:15.937Z] 
[2024-10-15T15:33:15.937Z] =============================== warnings summary ===============================
    
  
mishaschwartz commented 1 month ago

I think the configuration is overcomplicated

I think I agree that this has gotten out of hand.

The main issue is that I don't want to make it possible to break the service catalog which would break other things internally for the rest of the components in the stack. But I still don't fully understand how that is used...

For example, if I wanted to configure THREDDS with only fileServer for .nc and .txt

I think that your set up here is overly complicated actually (which highlights your point). I don't think you'd need to set THREDDS_MAGPIE_EXTRA_DATA_PREFIXES and I don't think we ever want people to modify THREDDS_DEFAULT_FILE_FILTERS (it's not provided in env.local.example as a settable option).

Is there really any advantage of having duplicate sets...

I think I agree with this. But the defaults vs. the extras were requested in the discussion here https://github.com/bird-house/birdhouse-deploy/pull/472#discussion_r1790748137 Do you no longer think that's a concern?

fmigneault commented 1 month ago

I think it might be worthwhile to remove the defaults vs extras duplication to make things easier for users in general. If one wants to preserve the defaults, it is easy to copy-paste its value and add the "extra" that is desired within a single variable.

crim-jenkins-bot commented 1 month ago

E2E Test Results

DACCS-iac Pipeline Results

Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2847/
Result :x: FAILURE

BIRDHOUSE_DEPLOY_BRANCH : thredds-more-configuration
DACCS_IAC_BRANCH : master
DACCS_CONFIGS_BRANCH : master
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master

DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-216.rdext.crim.ca

:warning: Infrastructure deployment failed. :warning:
Instance destroyed due to CI execution.
To debug, launch an instance manually with PR reference
thredds-more-configuration.

crim-jenkins-bot commented 1 month ago

E2E Test Results

DACCS-iac Pipeline Results

Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2848/
Result :white_check_mark: SUCCESS

BIRDHOUSE_DEPLOY_BRANCH : thredds-more-configuration
DACCS_IAC_BRANCH : master
DACCS_CONFIGS_BRANCH : master
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master

DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-216.rdext.crim.ca

PAVICS-e2e-workflow-tests Pipeline Results

Tests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1719/

NOTEBOOK TEST RESULTS
    
[2024-10-16T18:01:20.587Z] ============================= test session starts ==============================
[2024-10-16T18:01:20.587Z] platform linux -- Python 3.11.6, pytest-8.2.0, pluggy-1.5.0
[2024-10-16T18:01:20.587Z] rootdir: /home/jenkins/agent/workspace/PAVICS-e2e-workflow-tests_master
[2024-10-16T18:01:20.587Z] plugins: anyio-4.3.0, dash-2.17.0, nbval-0.11.0, tornasync-0.6.0.post2, xdist-3.5.0
[2024-10-16T18:01:20.587Z] collected 301 items
[2024-10-16T18:01:20.587Z] 
[2024-10-16T18:01:29.949Z] notebooks-auth/geoserver.ipynb ..................                        [  5%]
[2024-10-16T18:02:34.894Z] notebooks-auth/test_cowbird_jupyter.ipynb ..........                     [  9%]
[2024-10-16T18:02:40.584Z] notebooks-auth/test_thredds.ipynb ...........                            [ 12%]
[2024-10-16T18:03:27.629Z] pavics-sdi-master/docs/source/notebooks/CaSR_basic.ipynb ......          [ 14%]
[2024-10-16T18:03:36.576Z] pavics-sdi-master/docs/source/notebooks/WCS_example.ipynb .......        [ 17%]
[2024-10-16T18:03:46.309Z] pavics-sdi-master/docs/source/notebooks/WFS_example.ipynb ......         [ 19%]
[2024-10-16T18:11:22.227Z] pavics-sdi-master/docs/source/notebooks/climex.ipynb ............        [ 23%]
[2024-10-16T18:11:22.227Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-climate-stations.ipynb . [ 23%]
[2024-10-16T18:11:27.795Z] ...............                                                          [ 28%]
[2024-10-16T18:11:35.513Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-xclim.ipynb .....    [ 30%]
[2024-10-16T18:11:42.251Z] pavics-sdi-master/docs/source/notebooks/esgf-dap.ipynb .......           [ 32%]
[2024-10-16T18:11:57.194Z] pavics-sdi-master/docs/source/notebooks/forecasts.ipynb ......           [ 34%]
[2024-10-16T18:12:02.965Z] pavics-sdi-master/docs/source/notebooks/opendap.ipynb .......            [ 36%]
[2024-10-16T18:12:07.409Z] pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb .....       [ 38%]
[2024-10-16T18:15:29.558Z] pavics-sdi-master/docs/source/notebooks/regridding.ipynb ............... [ 43%]
[2024-10-16T18:16:27.634Z] .............                                                            [ 47%]
[2024-10-16T18:16:32.092Z] pavics-sdi-master/docs/source/notebooks/rendering.ipynb ....             [ 49%]
[2024-10-16T18:16:34.018Z] pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb ........ [ 51%]
[2024-10-16T18:16:50.274Z] .................                                                        [ 57%]
[2024-10-16T18:16:58.051Z] pavics-sdi-master/docs/source/notebooks/subsetting.ipynb ......          [ 59%]
[2024-10-16T18:16:58.999Z] pavics-sdi-master/docs/source/notebook-components/weaver_example.ipynb . [ 59%]
[2024-10-16T18:17:16.806Z] .........                                                                [ 62%]
[2024-10-16T18:17:26.310Z] finch-master/docs/source/notebooks/dap_subset.ipynb ...........          [ 66%]
[2024-10-16T18:17:35.354Z] finch-master/docs/source/notebooks/finch-usage.ipynb ......              [ 68%]
[2024-10-16T18:17:36.744Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-1DataAccess.ipynb . [ 68%]
[2024-10-16T18:17:39.764Z] .....                                                                    [ 70%]
[2024-10-16T18:17:54.688Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-2Subsetting.ipynb . [ 70%]
[2024-10-16T18:18:12.766Z] ............                                                             [ 74%]
[2024-10-16T18:18:27.686Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb . [ 75%]
[2024-10-16T18:18:50.859Z] .....s.                                                                  [ 77%]
[2024-10-16T18:18:59.014Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb . [ 77%]
[2024-10-16T18:19:15.301Z] ..                                                                       [ 78%]
[2024-10-16T18:19:21.923Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb . [ 78%]
[2024-10-16T18:20:23.213Z] .........                                                                [ 81%]
[2024-10-16T18:20:33.219Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-6Regridding_Conversion.ipynb . [ 82%]
[2024-10-16T18:25:23.925Z] ....                                                                     [ 83%]
[2024-10-16T18:25:23.925Z] PAVICS-landing-master/content/notebooks/hydrology/PAVICStutorial_Hydrology-01_Intro.ipynb . [ 83%]
[2024-10-16T18:25:23.925Z] ....                                                                     [ 85%]
[2024-10-16T18:25:23.925Z] PAVICS-landing-master/content/notebooks/hydrology/PAVICStutorial_Hydrology-02_Calibration.ipynb . [ 85%]
[2024-10-16T18:25:29.145Z] .....                                                                    [ 87%]
[2024-10-16T18:25:34.453Z] PAVICS-landing-master/content/notebooks/hydrology/PAVICStutorial_Hydrology-03_Watershed_properties.ipynb . [ 87%]
[2024-10-16T18:26:02.631Z] .............                                                            [ 91%]
[2024-10-16T18:26:05.951Z] PAVICS-landing-master/content/notebooks/hydrology/PAVICStutorial_Hydrology-04_Time_series_analysis.ipynb . [ 92%]
[2024-10-16T18:26:07.684Z] ......                                                                   [ 94%]
[2024-10-16T18:26:10.184Z] notebooks/hummingbird.ipynb ............                                 [ 98%]
[2024-10-16T18:28:49.817Z] notebooks/stress-tests.ipynb ......                                      [100%]
[2024-10-16T18:28:49.817Z] 
[2024-10-16T18:28:49.817Z] =============================== warnings summary ===============================