Azure / azure-storage-azcopy

The new Azure Storage data transfer utility - AzCopy v10
MIT License
609 stars 219 forks source link

Azcopy sync on IOT edge blob #1031

Closed abhaynahar closed 8 months ago

abhaynahar commented 4 years ago

Which version of the AzCopy was used? - 10.4.3

Which platform are you using? Mac and Linux

What command did you run?

./azcopy sync "/Home/data" "http:/127.0.0.1:11002/exptest/demo?[SAS]"

What problem was encountered?

When trying syn files from local system into iot edge blob storage using azcopy sync I get the following error INFO: Cannot infer destination location of http://127.0.0.1:11002/exptest/demo?[SAS]. Please specify the --from-to switch. Valid values are two-word phases of the form BlobLocal, LocalBlob etc. Use the word 'Blob' for Blob Storage, 'Local' for the local file system, 'File' for Azure Files, and 'BlobFS' for ADLS Gen2. If you need a combination that is not supported yet, please log an issue on the AzCopy GitHub issues list.

error parsing the input given by the user. Failed with error Unable to infer the source '/Home/data' / destination 'http://127.0.0.1:11002/exptest/demo?[SAS]

I tried adding --from-to switch then the error I get is Error: unknown flag: --from-to

Then I created a local host entry on the linux vm vi /etc/hosts and added the following line 127.0.0.1 exptest.blob.core.windows.net

Then the sync command I used was ./azcopy sync "/Home/data" "https://exptest.blob.core.windows.net:11002/demo?[SAS]"

the output was 3 Files Scanned at Source, 0 Files Scanned at Destination and the program was stuck here for ever

How can we reproduce the problem in the simplest way?

Follow the instructions here to https://docs.microsoft.com/en-us/azure/iot-edge/how-to-deploy-blob launch a iot edge blob on a linux VM The log into the VM and try to sync a file into blob storage running on the VM

./azcopy sync "/Home/data" "http:/127.0.0.1:11002/exptest/demo?[SAS]"

Have you found a mitigation/solution?

NO

Any help is much appreciated

InteXX commented 4 years ago

@abhaynahar

Any luck on this? I'm running into the same problem. It may be because we're trying to sync to the emulator and not a real storage account.

I'll test for that tomorrow and let you know what I find out.

InteXX commented 4 years ago

@abhaynahar

It's definitely a bug when connecting to an emulator:

func inferArgumentLocation(arg string) common.Location {
  if arg == pipeLocation {
    return common.ELocation.Pipe()
  }
  if startsWith(arg, "http") {
    // Let's try to parse the argument as a URL
    u, err := url.Parse(arg)
    // NOTE: sometimes, a local path can also be parsed as a url. To avoid thinking it's a URL, check Scheme, Host, and Path
    if err == nil && u.Scheme != "" && u.Host != "" {
      // Is the argument a URL to blob storage?
      switch host := strings.ToLower(u.Host); true {
      // Azure Stack does not have the core.windows.net
      case strings.Contains(host, ".blob"):
        return common.ELocation.Blob()
      case strings.Contains(host, ".file"):
        return common.ELocation.File()
      case strings.Contains(host, ".dfs"):
        return common.ELocation.BlobFS()
      case strings.Contains(host, benchmarkSourceHost):
        return common.ELocation.Benchmark()
        // enable targeting an emulator/stack
      case IPv4Regex.MatchString(host):               // <-- BUG here
        return common.ELocation.Unknown()             // <-- This is what gets returned in case of storage emulator URL
      }

      if common.IsS3URL(*u) {
        return common.ELocation.S3()
      }
    }
  }

  return common.ELocation.Local()
}

Source

For a moment I was hopeful that AzureCLI might work for this, but it turns out that AzureCLI uses AzCopy under the hood.

So we wait.

thisiscmt commented 4 years ago

FYI, the code in validators.go is also the cause of issue 713. Sure would be nice to get this fixed, it could potentially clear out several emulator-related issues.

promisepreston commented 3 years ago

I am currently experiencing this exact same issue. Really frustrating.

gapra-msft commented 11 months ago

Hi, Apologies for the delayed response here. Are you still unable to sync to emulator? I just attempted to run a similar command and did not hit the failure to parse the URL.

InteXX commented 11 months ago

@gapra-msft

I'm still getting an error when attempting to sync to Azurite on a remote server (on my LAN):

azcopy sync "D:\Dev\Data" "http://server5:10000/test/data"

INFO: The parameters you supplied were Source: 'd:\Dev\Data' of type Local, and Destination: 'http://server5:10000/test/data' of type Local INFO: Based on the parameters supplied, a valid source-destination combination could not automatically be found

I also tried:

azcopy sync "D:\Dev\Data" "http://server5:10000/devstoreaccount1/test/data"

...as that syntax is indicated in the auto-generated code by Storage Explorer. Same error.

Note that the usage guidance doesn't provide an example for syncing to an emulator, i.e. what the URL should contain:

azcopy sync

Sync an entire directory including its subdirectories (note that recursive is by default on):

In any case, the relevant source code hasn't changed since my original post on this thread of Jun 24, 2020. The line numbers have changed, yes, but not code surrounding the bug.

gapra-msft commented 10 months ago

@InteXX could you try manually specifying the --from-to CLI parameter? In this case it would be LocalBlob

InteXX commented 10 months ago

@gapra-msft

azcopy sync --from-to "D:\Dev\Data" LocalBlob

Results in:

unknown flag: --from-to

gapra-msft commented 10 months ago

@InteXX sorry I was a little unclear. Could you run this?

azcopy sync "D:\Dev\Data" "http://server5:10000/devstoreaccount1/test/data" --from-to=LocalBlob

InteXX commented 10 months ago

Same result:

unknown flag: --from-to

InteXX commented 10 months ago

It appears I'm running an old version: 10.4.3

I'm working out how to update...

InteXX commented 10 months ago

Cannot perform sync due to error: Login Credentials missing. No SAS token or OAuth token is present and the resource is not public

gapra-msft commented 10 months ago

@InteXX if you are running against Azurite, you need to provide SAS or OAuth token or make your resource in Azurite public

InteXX commented 10 months ago
Cannot perform sync due to error: cannot list files due to reason GET http://server5:10000/devstoreaccount1
--------------------------------------------------------------------------------
RESPONSE 400: 400 The value for one of the HTTP headers is not in the correct format.
ERROR CODE: InvalidHeaderValue
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Error>
  <Code>InvalidHeaderValue</Code>
  <Message>The value for one of the HTTP headers is not in the correct format.
RequestId:efe0df55-5790-4985-8b9e-856b23cb109c
Time:2023-12-11T22:18:10.816Z</Message>
  <HeaderName>x-ms-version</HeaderName>
  <HeaderValue>2023-08-03</HeaderValue>
</Error>
--------------------------------------------------------------------------------

I seem to remember this from a while back, from somewhere else.

How to fix that header value?

gapra-msft commented 10 months ago

Are you running the latest Azurite version? I believe its v3

InteXX commented 10 months ago

I'm running v3.28.0. Is that the latest?

I found that error message problem: https://github.com/Azure/azure-sdk-for-net/issues/14257

gapra-msft commented 10 months ago

Yes, this should be the latest version.

gapra-msft commented 10 months ago

You can manually downgrade the service version AzCopy uses by setting this environment variable AZCOPY_DEFAULT_SERVICE_API_VERSION to an appropriate Azure Storage Service version, or you can start azurite with the option to skip version check

InteXX commented 10 months ago

OK, I'll try that.

How do I stop/restart Azurite to make sure it gets run with the new switch? Is killing the Node process sufficient? I tried that but Azurite continued to respond.

gapra-msft commented 10 months ago

@InteXX I am not too familiar with all the details of Azurite. I would recommend you reach out to the Azurite team on their repo, they may also have some helpful information in the README https://github.com/Azure/Azurite

InteXX commented 10 months ago
Cannot perform sync due to error: cannot list files due to reason GET http://server5:10000/devstoreaccount1
--------------------------------------------------------------------------------
RESPONSE 403: 403 Server failed to authenticate the request. Make sure the value of the Authorization header is formed correctly including the signature.
ERROR CODE: AuthorizationFailure
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Error>
  <Code>AuthorizationFailure</Code>
  <Message>Server failed to authenticate the request. Make sure the value of the Authorization header is formed correctly including the signature.
RequestId:fcb5a6b3-e1af-4255-b10d-9e0f0a82f0c7
Time:2023-12-11T23:11:10.036Z</Message>
</Error>
--------------------------------------------------------------------------------
gapra-msft commented 10 months ago

What version of AzCopy are you running and what is the command you are running? Are you specifying a SAS?

InteXX commented 10 months ago

azcopy -v

azcopy version 10.21.1

azcopy sync "D:\Dev\Data" "http://server5:10000/devstoreaccount1/data/?sv=2018-03-28&se=2024-01-10T23%3A08%3A41Z&sr=c&sp=rwl&sig=UfTCGuuRHJSkNkdp23bp%2BIKy%2BdJFAicce7N5o%2BHpl1Y%3D" --from-to=LocalBlob
gapra-msft commented 10 months ago

Could you also share the AzCopy logs?

InteXX commented 10 months ago
2023/12/11 23:11:10 AzcopyVersion  10.21.1
2023/12/11 23:11:10 OS-Environment  windows
2023/12/11 23:11:10 OS-Architecture  amd64
2023/12/11 23:11:10 Log times are in UTC. Local time is 11 Dec 2023 14:11:10
2023/12/11 23:11:10 ISO 8601 START TIME: to copy files that changed before or after this job started, use the parameter --include-before=2023-12-11T23:11:05Z or --include-after=2023-12-11T23:11:05Z
2023/12/11 23:11:10 Any empty folders will not be processed, because source and/or destination doesn't have full folder support
gapra-msft commented 10 months ago

There should also be a log called jobID-scanning.log, does that show any error logs?

InteXX commented 10 months ago

I'm looking in %AppData%\.azcopy, should I be looking elsewhere?

gapra-msft commented 10 months ago

yeah that's the right location

InteXX commented 10 months ago

Hm... GUID-based filenames only, no jobID-scanning.log file.

gapra-msft commented 10 months ago

is there a file name with your job ID (the GUID that should be printed when you run AzCopy) -scanning.log? It would end up looking like this 84b36405-f652-1046-7b3d-a8f212fec184-scanning.log

InteXX commented 10 months ago
2023/12/11 23:28:22 AzcopyVersion  10.21.1
2023/12/11 23:28:22 OS-Environment  windows
2023/12/11 23:28:22 OS-Architecture  amd64
2023/12/11 23:28:22 Log times are in UTC. Local time is 11 Dec 2023 14:28:22
2023/12/11 23:28:22 ==> REQUEST/RESPONSE (Try=1/2.6279ms, OpTime=10.0742ms) -- RESPONSE STATUS CODE ERROR
   HEAD http://server5:10000/devstoreaccount1/data?se=2023-12-12T23%3A04%3A07Z&sig=-REDACTED-&sp=rwl&spr=https%2Chttp&sr=c&st=2023-12-11T23%3A04%3A07Z&sv=2018-03-28
   Accept: application/xml
   User-Agent: AzCopy/10.21.1 azsdk-go-azblob/v1.1.0 (go1.19.12; Windows_NT)
   X-Ms-Client-Request-Id: d40151bb-d60b-4bde-7ebc-a4b84239b248
   x-ms-version: 
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
   Connection: keep-alive
   Date: Mon, 11 Dec 2023 23:28:22 GMT
   Keep-Alive: timeout=5
   Server: Azurite-Blob/3.28.0
Response Details: 

2023/12/11 23:28:22 ==> REQUEST/RESPONSE (Try=1/2.5447ms, OpTime=2.5447ms) -- RESPONSE STATUS CODE ERROR
   GET http://server5:10000/devstoreaccount1?comp=list&delimiter=%2F&include=metadata&prefix=data%2F&restype=container&se=2023-12-12T23%3A04%3A07Z&sig=-REDACTED-&sp=rwl&spr=https%2Chttp&sr=c&st=2023-12-11T23%3A04%3A07Z&sv=2018-03-28
   Accept: application/xml
   User-Agent: AzCopy/10.21.1 azsdk-go-azblob/v1.1.0 (go1.19.12; Windows_NT)
   X-Ms-Client-Request-Id: 8a3e046a-e2a9-47ff-6902-6d215714dbff
   x-ms-version: 
   --------------------------------------------------------------------------------
   RESPONSE Status: 403 Server failed to authenticate the request. Make sure the value of the Authorization header is formed correctly including the signature.
   Connection: keep-alive
   Content-Type: application/xml
   Date: Mon, 11 Dec 2023 23:28:22 GMT
   Keep-Alive: timeout=5
   Server: Azurite-Blob/3.28.0
   X-Ms-Error-Code: AuthorizationFailure
   X-Ms-Request-Id: 5b35879f-21c4-4209-bb2f-3a8669956cbe
Response Details: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Error>   <Code>AuthorizationFailure</Code>   <Message>Server failed to authenticate the request. Make sure the value of the Authorization header is formed correctly including the signature. </Message> 

2023/12/11 23:28:22 Closing Log
InteXX commented 10 months ago

image

azcopy sync "D:\Dev\Data" "http://server5:10000/devstoreaccount1/data/?sv=2018-03-28&spr=https%2Chttp&st=2023-12-11T23%3A04%3A07Z&se=2023-12-12T23%3A04%3A07Z&sr=c&sp=rwl&sig=xzgKNQBbIqXy5dvc7mwIUy475HYXpDD%2FmsNOc0Qy90k%3D" --from-to=LocalBlob

Short version:

azcopy sync "D:\Dev\Data" "http://server5:10000/devstoreaccount1/data/

Have I got my folder syntaxes correct?

gapra-msft commented 10 months ago

Yes, the syntax is correct. Let me double check but this may be due to a SDK related change we made in this version. There is also a known issue in the latest version of Azcopy (10.22) Would you be willing to downgrade to 10.20.X until this is resolved? If so, I can grab the download URL for you if you let me know what distribution you are using

InteXX commented 10 months ago

Windows x64

gapra-msft commented 10 months ago

@InteXX sorry, I was digging deeper into this issue and it looks like this may be a problem with Azurite? I would recommend reporting this to them to see if this is expected behavior. I hit the permission mismatch issue when I use a container sas, like you have. However, it seems like an account level sas works with Azurite.

InteXX commented 10 months ago

account level sas

Hm, got a different error with that one. Perhaps I constructed the target portion of the URL incorrectly; the Azure Storage Explorer dialog doesn't directly provide a generated URL to copy.

Command:

azcopy sync "D:\Dev\Data" "http://server5:10000/devstoreaccount1/data/?sv=2023-01-03&ss=b&srt=sco&spr=https%2Chttp&st=2023-12-12T00%3A01%3A36Z&se=2023-12-13T00%3A01%3A36Z&sp=rwl&sig=%2FnTSadz7ipbXnYJ8cFyyQJt8prUtA5s6q5IjxKuCVGg%3D" --from-to=LocalBlob

Error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0x10858f2]

goroutine 23 [running]:
github.com/Azure/azure-storage-azcopy/v10/cmd.(*blobTraverser).parallelList.func1({0x1141500?, 0xc000184ef0}, 0xc000484660, 0xc000088240)
        D:/a/1/s/cmd/zc_traverser_blob.go:332 +0x2b2
github.com/Azure/azure-storage-azcopy/v10/common/parallel.(*crawler).processOneDirectory(0xc00057e600, {0x151ebf0, 0xc0002da540}, 0x0)
        D:/a/1/s/common/parallel/TreeCrawler.go:177 +0x338
github.com/Azure/azure-storage-azcopy/v10/common/parallel.(*crawler).workerLoop(0xc00057e600, {0x151ebf0, 0xc0002da540}, 0x0?, 0x0?)
        D:/a/1/s/common/parallel/TreeCrawler.go:104 +0xb1
created by github.com/Azure/azure-storage-azcopy/v10/common/parallel.(*crawler).runWorkersToCompletion
        D:/a/1/s/common/parallel/TreeCrawler.go:93 +0x50

Logs:

2023/12/12 00:17:53 AzcopyVersion  10.21.1
2023/12/12 00:17:53 OS-Environment  windows
2023/12/12 00:17:53 OS-Architecture  amd64
2023/12/12 00:17:53 Log times are in UTC. Local time is 11 Dec 2023 15:17:53
2023/12/12 00:17:53 ==> REQUEST/RESPONSE (Try=1/7.8726ms, OpTime=13.3649ms) -- RESPONSE STATUS CODE ERROR
   HEAD http://server5:10000/devstoreaccount1/data?se=2023-12-13T00%3A01%3A36Z&sig=-REDACTED-&sp=rwl&spr=https%2Chttp&srt=sco&ss=b&st=2023-12-12T00%3A01%3A36Z&sv=2023-01-03
   Accept: application/xml
   User-Agent: AzCopy/10.21.1 azsdk-go-azblob/v1.1.0 (go1.19.12; Windows_NT)
   X-Ms-Client-Request-Id: 73ba5cfb-df41-4899-70f1-ddc95483ca56
   x-ms-version: 
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
   Connection: keep-alive
   Date: Tue, 12 Dec 2023 00:17:52 GMT
   Keep-Alive: timeout=5
   Server: Azurite-Blob/3.28.0
Response Details: 

2023/12/12 00:17:53 ==> REQUEST/RESPONSE (Try=1/16.6177ms, OpTime=16.6177ms) -- RESPONSE SUCCESSFULLY RECEIVED
   GET http://server5:10000/devstoreaccount1?comp=list&delimiter=%2F&include=metadata&prefix=data%2F&restype=container&se=2023-12-13T00%3A01%3A36Z&sig=-REDACTED-&sp=rwl&spr=https%2Chttp&srt=sco&ss=b&st=2023-12-12T00%3A01%3A36Z&sv=2023-01-03
   X-Ms-Request-Id: [c48a13da-79b0-4bf6-8fe2-bd7e63a2a34a]
gapra-msft commented 10 months ago

Can you try using this version 10.20.1?https://azcopyvnext.azureedge.net/releases/release-10.20.1-20230809/azcopy_windows_amd64_10.20.1.zip

InteXX commented 10 months ago

v10.20.1 Still getting an authorization error. That previous memory address error was because I was including Service and Object resource types in my SAS, when I've only deployed Blobs in Azurite.

But when I generated a new server-level SAS, I got the same authorization error.

What do you mean by account-level SAS? I'm not finding documentation on it, but maybe I'm looking in the wrong place.

gapra-msft commented 10 months ago

Here is docs on account level sas

Not sure why the code formatted poorly in Github

This is Go code I used to generate my sas and URL when I was trying to repro this issue. `package main

import ( "fmt" "github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob" "github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/sas" "github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/service" "strings" "time" )

func main() { name := "devstoreaccount1" key := "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="

sharedKey, _ := blob.NewSharedKeyCredential(name, key)

serviceClient, _ := service.NewClientWithSharedKeyCredential("http://127.0.0.1:10000/devstoreaccount1/", sharedKey, nil)

containerClient := serviceClient.NewContainerClient("test")

resources := sas.AccountResourceTypes{Object: true, Container: true, Service: true}
permissions := sas.AccountPermissions{Write: true, List: true, Read: true}
expiry := time.Now().Add(1 * time.Hour)

qps, _ := sas.AccountSignatureValues{
    Version:       sas.Version,
    Protocol:      sas.ProtocolHTTPSandHTTP,
    Permissions:   permissions.String(),
    ResourceTypes: resources.String(),
    StartTime:     time.Now().UTC(),
    ExpiryTime:    expiry.UTC(),
}.SignWithSharedKey(sharedKey)

endpoint := containerClient.URL()
if !strings.HasSuffix(endpoint, "/") {
    // add a trailing slash to be consistent with the portal
    endpoint += "/"
}
endpoint += "?" + qps.Encode()

fmt.Println(endpoint)

} `

InteXX commented 10 months ago

Urg, looks like fun.

InteXX commented 10 months ago

Are you aware of an existing tool to run this code? If it were VB.NET or C# I could fire up a project real quick and run it, but I'm at a loss with Go.

gapra-msft commented 10 months ago

I run my Go code in GoLand, but VSCode has extensions to build and run go code as well. Since Azurite has the same static key, for now I should just be able to generate the URL for you to test. This should last a day. Just replace the 127.0.0.1 with server5 http://127.0.0.1:10000/devstoreaccount1/data/?se=2023-12-13T00%3A41%3A49Z&sig=L3lnQmsfXMYk6gOB6gLSfyOYLSCWOOIuxtZHCU5axQQ%3D&sp=rwl&spr=https%2Chttp&srt=sco&ss=b&st=2023-12-12T00%3A41%3A49Z&sv=2020-02-10

InteXX commented 10 months ago

Thanks, that got me started. At least there's no error message now.

But the sync hangs on the last of the three files in that folder—and it's always a different file. I made sure the files aren't in use, and I even copied them to a different source folder. Same result.

Odd...

gapra-msft commented 10 months ago

I can try and take a look at the log to see if theres anything causing a hang

InteXX commented 10 months ago

I'm getting the memory error again.

Could you regenerate that SAS and this time exclude Service and Object from the resources?

InteXX commented 10 months ago

Is this the extension?

https://marketplace.visualstudio.com/items?itemName=golang.go

gapra-msft commented 10 months ago

Technically removing the o will not allow any object level operations to be performed, so it shouldnt be removed, but you can try both.

Here is s removed

http://127.0.0.1:10000/devstoreaccount1/data/?se=2023-12-13T00%3A55%3A40Z&sig=c51z3%2BpBa7OqdC%2Fhj5RwKmb5%2Fw9444xYhe6ERk%2BPQJs%3D&sp=rwl&spr=https%2Chttp&srt=co&ss=b&st=2023-12-12T00%3A55%3A40Z&sv=2020-02-10

Here is s and o removed http://127.0.0.1:10000/devstoreaccount1/data/?se=2023-12-13T00%3A57%3A36Z&sig=LCYG8uXB3yFQAdHrnzwMvPQwZD%2FaBsxf9%2B45vhFiU5Y%3D&sp=rwl&spr=https%2Chttp&srt=c&ss=b&st=2023-12-12T00%3A57%3A36Z&sv=2020-02-10

gapra-msft commented 10 months ago

Is this the extension?

https://marketplace.visualstudio.com/items?itemName=golang.go

Yes that should be right. You need the code I pasted above saved as a file named main.go, then also a file called go.mod with the following contents to grab dependencies.

module awesomeProject

go 1.20

require github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.1.0

require ( github.com/Azure/azure-sdk-for-go/sdk/azcore v1.6.0 // indirect github.com/Azure/azure-sdk-for-go/sdk/internal v1.3.0 // indirect golang.org/x/net v0.10.0 // indirect golang.org/x/text v0.9.0 // indirect )

gapra-msft commented 10 months ago

This should also have more info on how to set up VSCode for Go https://code.visualstudio.com/docs/languages/go

InteXX commented 10 months ago
could not import github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob (no required module provides package "github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob"