Open timblakely opened 1 year ago
Thanks for reporting! What happens if you delete gs://blakely_dev/_staging/iteration/1/
?
Note that in GCS there is no concept of directories. there are buckets and objects. / is just a symbol in the object name.
https://stackoverflow.com/questions/52789714/google-cloud-storage-how-to-delete-a-folder-recursively-in-python has some examples how to fetch objects starting with a particular prefix. might be easier once https://github.com/apache/beam/issues/25676 is fixed.
cc: @BjornPrime
.take-issue
Thanks for reporting! What happens if you delete
gs://blakely_dev/_staging/iteration/1/
? Note that in GCS there is no concept of directories. there are buckets and objects. / is just a symbol in the object name.
Yup, I'm aware :) That does remove all the objects, but doesn't "recursively" work.
FYI the match()
function seems to function slightly differently than the GCS py client's bucket.list_blobs()
as that takes a prefix
and delimiter
that, if the prefix ends with the delimiter, will return both delimiter
-separated "directories" and the files with that prefix. If no delimiter
is passed, it matches all files with the prefix, which is what it would seem that match()
is intending to do (at least from the docstring :).
@ AnandInguva Is this issue update ?
cc @shunping
Can I pick it ?
@tsafacjo Please do that. Some guides are here: https://github.com/apache/beam/blob/master/contributor-docs/code-change-guide.md and https://beam.apache.org/contribute/#ways-you-can-contribute. Thanks!
thanks
@liferoad it looks like if this PR https://github.com/apache/beam/pull/29477/files#diff-c12c6d027caa8ddf49ae5488f38ebbdf798e8ae85d7d0d716c0ebd8cce9477fe already solved the problem.
@AnandInguva what is the problem for your PR?
What happened?
In the Python SDK,
GCSFileSystem.delete
suggests directories will be deleted recursively, but that doesn't appear to be the case...?e.g.I have bucket
blakely_dev
and the following paths:gs://blakely_dev/_staging/iteration/1/result
gs://blakely_dev/_staging/iteration/1/output-00000-of-00002
gs://blakely_dev/_staging/iteration/1/output-00001-of-00002
If I pass
gs://blakely_dev/_staging/
to.delete()
, despite it being a directory and a wildcard being appended if it ends with a/
, the following.match()
call within.delete()
matches neither subdirectories nor theresult
oroutput-0000.*
files.Issue Priority
Priority: 3 (minor)
Issue Components