allenai / tango

Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.
https://ai2-tango.readthedocs.io/
Apache License 2.0
532 stars 46 forks source link

Workspace.from_url imports all .py files when creating LocalWorkspace #606

Open jmerullo opened 1 year ago

jmerullo commented 1 year ago

🐛 Describe the bug

When using Workspace.from_url() to create a local workspace, all other python files in the directory will be imported in the process. As a result, the code (outside of an if __main__ block) in those scripts will be run.

Here's an example:

Directory structure: ./tango_workspace/ ./exp.py ./other_file.py

exp.py:

from tango import step
from tango.workspace import Workspace

@step(cacheable=True, deterministic=True)
def test_step():
    return 19

if __name__=='__main__':
    ws = Workspace.from_url('local://tango_workspace')
    print('ws', ws)
    print('result', test_step().result())

other_file.py:

#some unrelated script that does stuff
print("hello world")

running python exp.py will try to create the workspace, but in the process will import other_file.py, causing it to run and output "hello world". This is obviously a big problem if there are other python scripts in the directory that shouldn't be run.

There are a few workarounds. I believe the issue stems from this line in the cls.by_name call. I haven't tried to fully understand this code, but I think since LocalWorkspace is not in the registry, it causes this behavior. Therefore some workarounds that solve this issue are to import LocalWorkspace somewhere before creating the workspace:

In exp.py:

from tango.workspace import Workspace
from tango.workspaces.local_workspace import LocalWorkspace
...

But a more permanent solution is to add: from .workspaces import LocalWorkspace to the bottom of tango/__init__.py but I don't know if this is the best way to address this problem.

I am using version 1.2.1 on python 3.8.17. I think this issue would persist on 1.3 though

Versions

Python 3.8.17 ai2-tango==1.2.1

apoorvkh commented 1 year ago

Big +1! I also noticed significant unexpected behavior (i.e. unwanted code being executed) when using Tango. This explains it. Thanks @jmerullo!

Imo, all built-in implementations (LocalWorkspace, etc) should be registered by default. And I really don't think Tango should be importing my local files/modules in search of un-registered classes, as this (i.e. running all project code outside of if __name__ == '__main__' blocks) is unexpected and could be dangerous. I think the user should be responsible for importing the (non-default) Registrables they need.

https://github.com/allenai/tango/blob/26a162ab133b07bf34aff21d48a80c484e5b04e3/tango/common/registrable.py#L216C8-L239

Please consider, thanks!