aai-institute / lakefs-spec

An fsspec implementation for the lakeFS project
http://lakefs-spec.org/
Apache License 2.0
39 stars 4 forks source link

Improve `ls` dircache handling for recursive listings #200

Closed AdrianoKF closed 9 months ago

AdrianoKF commented 9 months ago

The updating of dircache entries in ls when dealing with recursive listings (which might contain files across multiple parent directories) needs to be improved.

The current implementation does not add all subdirectories from the recursive listing to the cache individually, but rather just the first one encountered.


Original discussion:

I think this goes in the direction of my forgotten debugging statement in the original PR, which was collecting all the directories that were part of the list_objects response.

We currently have this, which I believe is not enough:

# assumes that the returned info is name-sorted.
pp = self._parent(info[0]["name"])

Rather, we need to add every parent directory contained in the response to the cache (not just that of the first item), as you mentioned above. That's an easy enough fix, but it should be accompanied by a corresponding test.

_Originally posted by @AdrianoKF in https://github.com/aai-institute/lakefs-spec/pull/198#discussion_r1420246607_