Open kindofluke opened 2 years ago
I was thinking about this more and there may one unintended consequence. I think DRUM was set-up so that users could have kind of a model mono-repo such that they could have a lot of different code and maybe have a subfolder that had a custom.py
for datarobot. So consider this structure:
really_cool_predictor/
├─ webapp/
│ ├─ flask.py
├─ model_development.ipynb
├─ artifact.pkl
├─ datarobot-drum/
│ ├─ custom.py
In this case, DRUM won't find the custom.py
without using rglob
. A couple of ideas,
rglob
is actually ordered. so a custom.py
in the main code folder or its sub-folders would always appear before custom.py
files buried deep in venv
What: Within the model adapter, DRUM asserts that there can be only one
custom.py
file within the code folder.Here is the snippet at line 139:
the problem lies with the
rglob
function on a path. The full recursive search forcustom.py
means the the system will search. all subdirectories in the code folder and find anycustom.py
file. Several major python packages include acustom.py
for other reasons and those files are found within the virtual environment folder.When multiple
custom.py
files are found, the assertion on line 139 will fail throwing an error.Example
The openpyxl has several files named
custom.py
and those all get captured if your virtual environment is part of the code folder.Here is a look at
custom_file_paths
in a debugger.The assertion will then fail.
I believe that I adopt a fairly common practice of having my virtual environment in my code folder and adding the environment to
.gitignore
solution
I think changing
rglob
toglob
will fix this issue. I have tested this locally but I am not able to install all the prerequisites for running the test suite so I can't submit a branch.