ai2cm / fv3net

explore the FV3 data for parameterization
MIT License
16 stars 3 forks source link

The "to_url" function returning the proper result for absolute local paths. #2209

Closed mrudko closed 1 year ago

mrudko commented 1 year ago

This pull request introduces a slight modification to the return value of to_url function. This change is motivated by the fact that the existing configuration of the function cannot properly deal with the HPC cluster directory structure. Performing the fv3net-style segmented simulations on HPC cluster (Stellar) requires locating of the restart files within the output directory. The output directory is local absolute path rather then the path on the cloud.

Refactored public API:

Resolves #2202

spencerkclark commented 1 year ago

The issue building the fv3fit_torch image looks unrelated:

#18 [14/14] RUN pip3 install dgl-cu113 dglgo -f https://data.dgl.ai/wheels/repo.html
#18 sha256:e6625a465442fa9ddc44ede99f8e292d1c06e3f08c86e4bf94fb78e728ecdb5c
#18 1.225 Looking in links: https://data.dgl.ai/wheels/repo.html
#18 1.379 ERROR: Could not find a version that satisfies the requirement dgl-cu113 (from versions: none)
#18 1.380 ERROR: No matching distribution found for dgl-cu113
#18 ERROR: executor failed running [/bin/sh -c pip3 install dgl-cu113 dglgo -f https://data.dgl.ai/wheels/repo.html]: exit code: 1

This could be a transient issue, so we'll see if it resolves itself upon your next push, or maybe someone else on our team will need to look into installing that library from another source.

mrudko commented 1 year ago

Thanks @spencerclark. I have changed the title and the description of the pull request accordingly.

I experimented a little bit with the test_read_last_segment function. After introducing additional check for path existence into the test and retaining the return statement of the to_url function unmodified, the test indeed fails.

tmpdir = local('/tmp/pytest-of-mr7417/pytest-17/test_read_last_segment0')

    def test_read_last_segment(tmpdir):
        date1 = "20160101.000000"
        date2 = "20160102.000000"
        arts = tmpdir.mkdir("artifacts")
        arts.mkdir(date1)
        arts.mkdir(date2)
        ans = read_last_segment(str(tmpdir))
    #    assert f"file:/{str(tmpdir)}/artifacts/20160102.000000" == ans
        fs = get_fs(ans)
>       assert fs.exists(ans)
E       AssertionError: assert False
E        +  where False = <bound method AbstractFileSystem.exists of <fsspec.implementations.local.LocalFileSystem object at 0x145debb76670>>('file://tmp/pytest-of-mr7417/pytest-17/test_read_last_segment0/artifacts/20160102.000000')
E        +    where <bound method AbstractFileSystem.exists of <fsspec.implementations.local.LocalFileSystem object at 0x145debb76670>> = <fsspec.implementations.local.LocalFileSystem object at 0x145debb76670>.exists

test_append.py:17: AssertionError

I have introduced all the suggested changes to the existing state of the test. The test fails for the previous version of the to_url function (return protocol + "://" + path.lstrip("/")), and it passes for the modified version of the to_url function (return protocol + "://" + path).

oliverwm1 commented 1 year ago

LGTM, I agree that this function should not strip the leading / of the provided path since it's clearly important to indicate that it's an absolute path. Although I admit I'm not totally clear on the behavior of the file protocol of fsspec when using relative paths.

spencerkclark commented 1 year ago

Thanks @oliverwm1 -- in some experiments locally using a relative path with the file protocol seems to work as one might expect:

$ touch foo.txt
$ python
>>> import fsspec; import vcm
>>> fs = vcm.cloud.get_fs("foo.txt")
>>> fs.exists("file://foo.txt")
True
>>> fs.exists("file:///foo.txt")
False