allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.61k stars 651 forks source link

AttributeError for 'fspath' with str and bytes in WrapperBase #1155

Open jokokojote opened 10 months ago

jokokojote commented 10 months ago

Describe the bug

When passing a parameter from one pipeline component to another, this parameter is wrapped (<LazyEvalWrapper>). When calling os.fspath on this parameter and the parameter is of type str or bytes, an error occurs:

File "/opt/jupyter-server/venv-jupyter-server/lib/python3.11/site-packages/clearml/utilities/proxy_object.py", line 334, in method
    mtd = getattr(obj, name)
          ^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute '__fspath__'

This is due to the fact that str and bytes do not implement __fspath__, but os._fspath checks for these types to handle them as valid path-like objects.

if isinstance(path, (str, bytes)):
        return path

That is, when fspath is called on the wrapped parameter (of type str / bytes) it is not recognised as one of these types (because it is still wrapped) and __fs__path is called on it. WrapperBase proxies this call, but then tries to call __fspath__ on the object like it would be some normal path-like object.

To reproduce

See this minimal example code:

from clearml import PipelineDecorator

@PipelineDecorator.component(cache=False,  return_values=['path'])
def dummy_path_provider():
    path = "/root/.clearml/cache/storage_manager/datasets/ds_966214ba3cbe4ae8ade5599a670a8d0b"  # provide some path e.g. after dataset download
    return path

@PipelineDecorator.component(cache=False) 
def dummy_path_consumer(path):
    import os

    search_path = os.path.join(path, '*.csv') # NOT WORKING, AttributeError: 'str' object has no attribute '__fspath__'
    # Do something with the path, e.g. search / load .csv files

@PipelineDecorator.pipeline(name="pipeline", 
                            project="test", 
                            add_pipeline_tags="True")
def pipeline():
    path = dummy_path_provider()
    dummy_path_consumer(path)

PipelineDecorator.debug_pipeline()
pipeline()

Expected behaviour

Pipeline component parameters of type str and bytes should be usable with the path protocol.

Environment

jokokojote commented 10 months ago

"I considered making a pull request to address this issue, but I'm not entirely sure about the best solution for this. One straightforward way could be of course to add specific handling for fspath in the make_method function of WrapperBase, like so:

def make_method(name):
    def method(self, *args, **kwargs):
        # Existing code...

        # Special handling for __fspath__ when the object is a string or bytes
        if name == '__fspath__' and isinstance(obj, (str, bytes)):
            return obj

        # Rest of the existing code...

However, you know your architecture the best. I am happy make a pull request if desired.

eugen-ajechiloae-clearml commented 10 months ago

Hi @jokokojote ! We will need something a little bit more robust to solve this issue. The root of the problem is that isinstance doesn't evaluate the proxy object properly, so we will actually need to find a solution for this.