GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
865 stars 331 forks source link

Cannot download file from the bucket with colon (:) in the name #1513

Closed goodwin64 closed 7 months ago

goodwin64 commented 2 years ago

The command I ran:

gsutil -m cp -r "gs://path/to/bucket" .

Problematic file name:

2022-04-08T17:13:08.136Z-735ec64eb2fb7a61c37a26112e920abe52cdf36f.json
_____________^

Notice the colon in the name that delimits hours:minutes:seconds. Windows file system doesn't allow files with this character in the filename.

The issue seems to exist for a long time so Windows users need to apply a patch described here.

Is there a chance this patch to be applied in the next release of gsutil?

NickGoog commented 2 years ago

Hi, I've added this to our backlog. We plan on replacing colons with a character like "�".

This is to avoid leaving people with two sets of correct-looking file names (one with colons and one with hyphens), which could lead to confusion and doesn't indicate lack of Windows support for colons.

Ideally, this is implemented somewhere like FileUrl to support the character conversion in all places on Windows.

We also plan on introducing this to gcloud storage :)

NickGoog commented 2 years ago

@thomasmaclean wisely suggested expanding the conversions to more banned characters for Windows file names "*/:<>?\| We might have to do something like " -> �1, * -> �2, ... to prevent naming conflicts. For example, we don't want a file name collision if we attempt to convert both foo<.txt and foo>.txt to foo�.txt.

@goodwin64 , since this is growing into a slightly larger project, would you care if we implemented this in only gcloud storage? It's the future storage CLI: http://cloud/sdk/gcloud/reference/alpha/storage

goodwin64 commented 2 years ago

No objections @NickGoog 👍

NickGoog commented 2 years ago

Substituting invalid Windows characters should be the default behavior in the next release of gcloud alpha storage (probably going public around next Wednesday).

P.S. We replaced "�" with "$" for terminals that don't support Unicode.

sw-oC commented 1 year ago

Hello I just tried to download a takeout-archive generated in Google workspace from Windows using the gcloud alpha storage cp command (same with gcloud storage cp).

Automatically renaming the folders containing invalid characters on Windows like ":" before downloading works. Local folder are named "\Resource$1 abc". The file is downloaded as filename.zip_.gstmp, but the following error appears after downloading the file:

ERROR: [WinError 3] The system cannot find the path specified: '.\folder1\Resource$1 abc\filename.zip_.gstmp' -> '.\folder1\Resource: abc\filename.zip'

I guess it tries to write to the initial folder name "\Resource: abc" (which does not exist).

What could be the issue? Has the feature from Jun 2022 above been tested? Thank you for your help.

NickGoog commented 1 year ago

Hi, can you provide reproduction steps?

sw-oC commented 1 year ago

C:>gcloud storage cp --recursive gs://bucket/folder1 .

Copying gs://bucket/folder1/Resource: abc/takeout-20221228T073541Z-001.zip to file://.\folder1\Resource: abc\takeout-20221228T073541Z-001.zip

WARNING: The following characters are invalid in Windows file and directory names: /:*?"<>|

Renaming .\folder1\Resource: abc\takeout-20221228T073541Z-001.zip.gstmp to .\folder1\Resource$1 abc\takeout-20221228T073541Z-001.zip.gstmp

\ERROR: [WinError 3] The system cannot find the path specified: '.\folder1\Resource$1 abc\takeout-20221228T073541Z-001.zip_.gstmp' -> '.\20221228T073540Z\Resource: abc\takeout-20221228T073541Z-001.zip'

NickGoog commented 1 year ago

Thanks for the info. Looks like we need to add a "create directories in path if they do not already exist" behavior, but I'm surprised it doesn't already exist for copies without special characters. Will forward this to our current OnDuty.

sw-oC commented 1 year ago

Hi NickGoog, the folders are created with ":" replaced by "$1" and the files seem to be downloaded as *_.gstmp. But after download, the Error3 occurs and files seem not to be properly renamed. Thanks for looking into it.

NickGoog commented 1 year ago

@dilipped has a fix, but, as a result of gcloud release cycles, it probably won't be out until Tues 1/31

haripetrov commented 1 year ago

@dilipped has a fix, but, as a result of gcloud release cycles, it probably won't be out until Tues 1/31

Is there any progress with this issue?

brolin-empey commented 7 months ago

I wish Google Takeout only generated file/directory names using characters that are normally allowed in file/directory names on Windows. I too have this problem that gsutil on Windows cannot create directories whose name contains a colon when trying to download the export of a Google Workspace domain from Google Cloud Storage. I have redd about workarounds such as trying to use gsutil to rename the directories in Google Cloud Storage to remove the colon before downloading the export but I argue that characters such as colon that cannot normally be used in file/directory names on Windows should not be used by Google Takeout in the first place. It seems like it would be easier/faster, assuming that I even need the parts of the export stored in directories whose name contains a colon, to try to install gsutil on GNU+Linux, download the complete export on GNU+Linux to a file system that allows a colon in file/directory names, rename all of the files/directories whose names contain a colon to eliminate the colons, possibly by using the “rename” program/command, then copy or move the fixed export back to the NTFS volume on the Windows computer. I should not have to do all of this work when I am paying money every month for Google Workspace and the simpler solution is not use a colon in the file/directory names in the first place. I do not know if I have access to a sufficiently recent version of GNU+Linux, in my case Debian or some derivative such as Ubuntu and derivatives, to use a sufficiently recent version of gsutil because I use GNU+Windows NT with Cygwin as a desktop/portable computer client platform so I have a current installation of Windows 10 but my GNU+Linux installations other than on my virtual dedicated server that hosts my public Web sites use antiquated versions of the distribution because they still work for my use case so I do not upgrade them because I already spend most of my waking life on my computers so I do not want to spend hours or days fixing breakage caused by unnecessarily upgrading the operating system on computers that I usually only use from the command-line interface over the network from my current installation of Windows 10 on my primary computer. Does anyone have sufficient pull within Google to have Google Takeout changed to not use colon nor any other character that Windows does not allow in file/directory names? Even trying to rename a directory in Google Cloud Storage seems complicated because I could not find any way to do this seemingly simple and common operation from the Web GUI so apparently I have to use gsutil to rename directories but I should not need to rename the directories in the first place because Google should not be using characters that are often problematic in the file/directory names in the first place.

brolin-empey commented 7 months ago

I ended up using Ubuntu in the Windows Subsystem for Linux to download the exports using gsutil, use the “rename” program to eliminate the colons from the directory names, then move the exports back to the normal Windows part of my file system.

dilipped commented 7 months ago

As mentioned in https://github.com/GoogleCloudPlatform/gsutil/issues/1513#issuecomment-1154186966, this has been addressed in gcloud storage. Additionally the issue mentioned in https://github.com/GoogleCloudPlatform/gsutil/issues/1513#issuecomment-1383656774 has been addressed as well. I think we can close this issue as we do not intend to fix it in gsutil and would recommend users to try the new gcloud storage CLI. Feel free to comment/reopen if there are any other issues.

jsoref commented 2 months ago

It seems like you could have at least failed fast when you encountered : instead of letting gsutil waste huge gobs of api calls for millions of files (loki filenames all have many :s in them).

Actually, it isn't failing, it's just generating a message that looks like a fatal error but only relating to the gstmp file... which means the data was actually retrieved.

This is a lousy user experience. Failing fast and suggesting gcloud storage rsync would have been a much better experience.