crim-ca / stac-populator

Workflow logic to populate STAC catalog with demo datasets.
MIT License
2 stars 2 forks source link

Problems running the populator #62

Open huard opened 1 day ago

huard commented 1 day ago

Running the command found in the README

    "http://localhost:8880/stac/" \
    "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/xclim/cmip6/catalog.html" \
    "STACpopulator/implementations/CMIP6_UofT/collection_config.yml"

fails with

usage: add_CMIP6.py [-h] [--update] [--mode {full,single}] [--config CONFIG] [--no-verify] [--cert CERT] [--auth-handler {basic,digest,bearer,proxy,cookie}] [--auth-identity AUTH_IDENTITY]
                    stac_host href
add_CMIP6.py: error: unrecognized arguments: STACpopulator/implementations/CMIP6_UofT/collection_config.yml

If I try to run this without the collection path, I get:

 python STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py     "http://localhost:8880/stac/"     "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/xclim/cmip6/catalog.html"
2024-10-11 14:48:27.908713 [INFO] :: PYESSV :: Loading vocabularies from /home/david/.esdoc/pyessv-archive ... please wait
Traceback (most recent call last):
  File "/home/david/src/stac-populator/STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py", line 137, in <module>
    sys.exit(main())
  File "/home/david/src/stac-populator/STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py", line 133, in main
    return runner(ns)
  File "/home/david/src/stac-populator/STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py", line 123, in runner
    ns.stac_host, data_loader, update=ns.update, session=session, config_file=ns.config, log_debug=ns.debug
AttributeError: 'Namespace' object has no attribute 'debug'

If I try to run cli.py directly, I get

Traceback (most recent call last):
  File "/home/david/src/stac-populator/STACpopulator/cli.py", line 4, in <module>
    import logging
  File "/home/david/src/stac-populator/STACpopulator/logging.py", line 3, in <module>
    import logging.config
ModuleNotFoundError: No module named 'logging.config'; 'logging' is not a package
huard commented 1 day ago

stac_populator works well.

huard commented 1 day ago

Should we remove the possibility to execute the individual implementations directly ? ie remove the __main__ from the implementations and adjust the example in the README ?

Also, this example in the README fails with a validation error.

fmigneault commented 1 day ago

@mishaschwartz

ModuleNotFoundError: No module named 'logging.config'; 'logging' is not a package

Exactly the error I was expecting. Should add import logging explicitly and not rely on logging.config doing some magic.

AttributeError: 'Namespace' object has no attribute 'debug'

The implementations fail to correctly apply add_parser_args of the global CLI for the logging options. They only add their own add_parser_args. We should have a separate add_logging_options similar to the add_request_options.

Should we remove the possibility to execute the individual implementations directly ?

I'd rather not. There's no technical reason to limit this.

mishaschwartz commented 1 day ago

@huard

Running the command found in the README ...

This command is broken as has been since before my recent updates. The configuration file should be passed with the --config flag, not as a positional argument.

The README is wrong and should be updated.

If I try to run this without the collection path, I get:

Yes this is a bug, the debug flag only applies to the logger which is only initialized when run through the CLI.

I am already working on code cleanup that will fix this.

If I try to run cli.py directly, I get

This is because the current script is already called "logging.py" and that clobbers the builtin "logging" module in the namespace when imported directly.

This is fixed in code cleanup work that is in progress (see above).

@fmigneault

ModuleNotFoundError: No module named 'logging.config'; 'logging' is not a package

Exactly the error I was expecting.

Are you talking about the discussion here? https://github.com/crim-ca/stac-populator/pull/60#discussion_r1793956655 Because that is unrelated to the current problem.

AttributeError: 'Namespace' object has no attribute 'debug'

The implementations fail to correctly apply add_parser_args of the global CLI for the logging options. They only add their own add_parser_args. We should have a separate add_logging_options similar to the add_request_options.

That's one option... or we could stick with a single interface to run the populators:

Should we remove the possibility to execute the individual implementations directly ?

Yes! Everything should go through one interface:

There's no technical reason to limit this.

Technical reasons:

huard commented 1 day ago

I also fear trying to support two entry points would make a mess.

mishaschwartz commented 1 day ago

Should add import logging explicitly and not rely on logging.config doing some magic.

To explain this further, here is a minimal concrete example that demonstrates that the issue is due to conflicting module names:

All of the following commands produce the error ModuleNotFoundError: No module named 'logging.config'; 'logging' is not a package

$ echo 'import logging.config' > logging.py && python logging.py
$ echo 'import logging; import logging.config' > logging.py && python logging.py
$ echo 'import logging.config; import logging' > logging.py && python logging.py

So you can see that adding an import logging doesn't matter because the file logging.py is preventing the builtin logging module from being imported. We can demonstrate this further:

$ rm logging.py
$ python -m logging 
No module named logging.__main__; 'logging' is a package and cannot be directly executed 
# this means it is accessing the builtin logging module

$ echo "print('you are here')" > logging.py
$ python -m logging
you are here
# this means it is accessing the logging.py script
fmigneault commented 1 day ago

I agree about making stac-populator the official entrypoint and updating the README with it. Just saying that in this specific case, the fix is simple. Just need to apply the logging arguments to the sub-parser. This is already done for the request arguments (--no-verify, etc. are there when calling python add_CMIP6.py directly), so it's not like adding the logging arguments is causing extra maintenance burden.

As for the logging issue, I'm guessing add_CMIP6.py was called from within https://github.com/crim-ca/stac-populator/tree/master/STACpopulator, so it thinks import logging refers to https://github.com/crim-ca/stac-populator/blob/master/STACpopulator/logging.py. The file could be renamed to avoid side effects based on the directory where the CLI is called from.