Closed akmukherjee closed 1 year ago
@taylor-curran : Just created the bug report.
Thanks for the issue @akmukherjee, and thanks for sharing a project to use for reproduction! If I set the working directory for a process work pool, I can see that the git submodule is being cloned along with base repository. However, the error where Python is not able to load your submodule is a puzzling one. I will continue investigating that error and report back with what I find!
Thanks Alex! Very Respectfully, Amit
On Fri, May 12, 2023 at 3:05 PM Alexander Streed @.***> wrote:
Thanks for the issue @akmukherjee https://github.com/akmukherjee, and thanks for sharing a project to use for reproduction! If I set the working directory for a process work pool, I can see that the git submodule is being cloned along with base repository. However, the error where Python is not able to load your submodule is a puzzling one. I will continue investigating that error and report back with what I find!
— Reply to this email directly, view it on GitHub https://github.com/PrefectHQ/prefect/issues/9555#issuecomment-1546161306, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTZDJQHE7ZIHSB5XP53PYTXF2CXHANCNFSM6AAAAAAX7Z4HN4 . You are receiving this because you were mentioned.Message ID: @.***>
If you import from a submodule inside your flow, I found that you'll get the import error you saw (it seems related to https://github.com/PrefectHQ/prefect/issues/9542). When I moved the import of your git submodule to the top of the file, your example flow started working. You can see the change I made on my fork of your repository.
The git submodules functionality for the git_clone_project
appears to work as expected. I will close this issue, but if you encounter any other issue with git submodules in projects, feel free to open a new issue or reopen this one!
Thanks @desertaxle . That does indeed work. I tested it out with your fork of the repo. Thanks a bunch.!
This will also impact any implementation of lazy loading within submodules that point to other submodules.
For example, I tried building a prefect flow (deployment.py
) that simply wraps all functions from main
with a new function using the same name. Something like this:
# main.py
def foo(arg1, arg2):
do_stuff()
# deployment.py
@task
def foo(*args, **kwargs):
from main import foo as wrapped_func
return wrapped_func(*args, **kwargs)
This fails due to the issue described in this thread. I updated approach to this instead:
# deployment.py
from main import foo as _foo
@task
def foo(*args, **kwargs):
return _foo(*args, **kwargs)
but then it fails because main.py
also implements lazy loading:
prefect_worker | File "/tmp/tmpck2dkmalprefect/git_repo-branch_name/deployment.py", line 41, in datasource_extract
prefect_worker | return _datasource_extract(*args, **kwargs)
prefect_worker | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
prefect_worker | File "/tmp/tmpck2dkmalprefect/git_repo-branch_name/main.py", line 69, in datasource_extract
prefect_worker | from datasource.src.extract import get_session
prefect_worker | ModuleNotFoundError: No module named 'datasource'
For reference, my project directory looks something like this:
./project/
-|-deployment.py
-|-main.py
-|-
-|-/datasource/
-|-|-/src/
-|-|-|-extract.py
-|-|-|-transform.py
-|-
-|-/storage/
-|-|-/src/
-|-|-|-load.py
I guess the next step would be to completely remove lazy loading of submodules from everywhere in the project.
First check
Bug summary
I have abug report for Submodules initialization in Prefect Projects. Currently my prefect.yaml looks as shown below:
This set up does not pull the submodules associated with this repo. This is really easy to reproduce and I have reproduced it in my code here and this has been discussed here and merged here.
Reproduction
Error
Please see the screenshots here: https://github.com/PrefectHQ/prefect/issues/9462
Versions
Additional context
No response