dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
10.78k stars 1.34k forks source link

Using `load_assets_from_package_module` errors when any submodule has the same directory name as a dependency #20422

Closed CSRessel closed 3 months ago

CSRessel commented 4 months ago

Dagster version

dagster, version 1.6.9

What's the issue?

When using a module name that happens to be the same as one of the project dependencies, dagster.load_assets_from_package_module will error out, with something like:

ModuleNotFoundError: No module named 'dagster_quickstart.assets.pandas._config'

Full Stack Traces

``` (repro) ➜ repro_bare_find_modules git:(main) dagster dev 2024-03-11 23:34:33 +0000 - dagster - INFO - Launching Dagster services... 2024-03-11 23:34:35 +0000 - dagster.code_server - ERROR - Error while importing code Traceback (most recent call last): File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_core/code_pointer.py", line 134, in load_python_module return importlib.import_module(module_name) File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/dagster_quickstart/__init__.py", line 5, in all_assets = load_assets_from_package_module(assets) File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_core/definitions/load_assets_from_modules.py", line 282, in load_assets_from_package_module ) = assets_from_package_module(package_module) File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_core/definitions/load_assets_from_modules.py", line 230, in assets_from_package_module return assets_from_modules( File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_core/definitions/load_assets_from_modules.py", line 68, in assets_from_modules for module in modules: File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_core/definitions/load_assets_from_modules.py", line 346, in find_modules_in_package submodule = import_module(f"{package_module.__name__}.{modname}") File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1004, in _find_and_load_unlocked ModuleNotFoundError: No module named 'dagster_quickstart.assets.pandas._config' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_grpc/server.py", line 408, in __init__ self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories( File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_grpc/server.py", line 242, in __init__ loadable_targets = get_loadable_targets( File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_grpc/utils.py", line 50, in get_loadable_targets else loadable_targets_from_python_module(module_name, working_directory) File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_core/workspace/autodiscovery.py", line 35, in loadable_targets_from_python_module module = load_python_module( File "/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_core/code_pointer.py", line 139, in load_python_module raise DagsterImportError( dagster._core.errors.DagsterImportError: Encountered ImportError: `No module named 'dagster_quickstart.assets.pandas._config'` while importing module dagster_quickstart. Local modules were resolved using the working directory `/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules`. If another working directory should be used, please explicitly specify the appropriate path using the `-d` or `--working-directory` for CLI based targets or the `working_directory` configuration option for workspace targets. /mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules/repro/lib/python3.10/site-packages/dagster/_core/workspace/context.py:622: UserWarning: Error loading repository location dagster_quickstart:dagster._core.errors.DagsterImportError: Encountered ImportError: `No module named 'dagster_quickstart.assets.pandas._config'` while importing module dagster_quickstart. Local modules were resolved using the working directory `/mnt/home/clifford.ressel/Documents/source/repro_bare_find_modules`. If another working directory should be used, please explicitly specify the appropriate path using the `-d` or `--working-directory` for CLI based targets or the `working_directory` configuration option for workspace targets. ... (repeated several times) warnings.warn(f"Error loading repository location {location_name}:{error.to_string()}") 2024-03-11 23:34:36 +0000 - dagster-webserver - INFO - Serving dagster-webserver on http://127.0.0.1:3000 in process 558615 ```

What did you expect to happen?

The call would discover all the submodules of the given module correctly.

How to reproduce?

Minimal repro here, with steps for running it:

https://github.com/dagster-io/dagster-quickstart/compare/main...CSRessel:dagster-bug-repro:main

Deployment type

None

Deployment details

No response

Additional information

I have a fix for this I will post below. If it would be helpful for me to submit the PR and test this out, then I'm happy to do so if that helps speed up the resolution!

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

CSRessel commented 4 months ago

This resolved the issue for me, both in the repro and in my own work:

https://github.com/CSRessel/dagster/commit/06dbee46d11adecadb7ac4ff07f6dd6c28bc3ab0

The problem arises from calling os.path.dirname on a filepath. This doesn't disambiguate whether you're looking at that folder as a module within the project, or another global dependency of the same name. Thankfully pkgutil.walk_packages has a prefix keyword for this purpose, which gives us back the fully qualified module name.

garethbrickman commented 4 months ago

@CSRessel Thanks so much for this! Please do make a PR and I'll get it to our engineers to review and merge 🙏

CSRessel commented 4 months ago

The PR is up and should be good to review

smackesey commented 3 months ago

Closed by #20425.