Azure-Samples / azure-batch-samples

Azure Batch and HPC Code Samples
Other
260 stars 489 forks source link

The request body is too large and exceeds the maximum permissible limit #106

Closed jmlero closed 8 years ago

jmlero commented 8 years ago

Using the python blobxfer, many times and with many large files (but always smaller than 190G), I receive the error "The request body is too large and exceeds the maximum permissible limit" I can upload some large files, but is not possible for others files, doesn't matter how many retries or the number of workers.

Also, I have this problem when the local resource to upload is a folder (a folder containing one single file), but if I try to upload the same file specifying the file, the error doesn't appear.

I am using the following python packages:

pip freeze azure==1.0.2 azure-common==1.0.0 azure-mgmt==0.20.1 azure-mgmt-common==0.20.0 azure-mgmt-compute==0.20.0 azure-mgmt-network==0.20.1 azure-mgmt-nspkg==1.0.0 azure-mgmt-resource==0.20.1 azure-mgmt-storage==0.20.0 azure-nspkg==1.0.0 azure-servicebus==0.20.1 azure-servicemanagement-legacy==0.20.1 azure-storage==0.20.2 elasticsearch==2.2.0 futures==3.0.3 python-dateutil==2.4.2 requests==2.9.1 six==1.10.0 urllib3==1.14 wheel==0.26.0

And blobxfer.py v0.9.9.5

As an example:

azure blobxfer parameters [v0.9.9.5] subscription id: None management cert: None transfer direction: local->Azure local resource: archive/ remote resource: None max num of workers: 4 timeout: None storage account: --- use SAS: False upload as page blob: False auto vhd->page blob: False container: --- blob container URI: https://---.blob.core.windows.net/--- compute file MD5: True skip on MD5 match: True chunk size (bytes): 4194304 create container: True keep mismatched MD5: False recursive if dir: True keep root dir on up: False collate to: disabled

script start time: 2016-01-29 10:00:09

g--.tar.gz md5: lhD55kDeLW9uh4PXtJ7LhQ== detected 0 empty files to upload performing 25600 put blocks/blobs and 1 put block lists xfer progress: [ ] 0.00% 0.00 blocks/min The request body is too large and exceeds the maximum permissible limit. <?xml version="1.0" encoding="utf-8"?>RequestBodyTooLargeThe request body is too large and exceeds the maximum permissible limit. RequestId:a6af7e74-0001-00f6-0474-5ae0d5000000 Time:2016-01-29T09:06:25.4964043Z100000

ls -lah -rw-r--r-- 1 root root 100G Jan 26 15:46 g--.tar.gz

alfpark commented 8 years ago

Can you upgrade to 0.9.9.10 and reproduce the issue?

jmlero commented 8 years ago

I tried with the version 0.9.9.9 (the one available at this moment).

The structure of the folder I would like to upload is the following:

/data/groups/folder/archive/file1 /data/groups/folder/archive/file2 ... /data/groups/folder/archive/fileN

100 GB each one.

And the layout I would like on the storage is the following in this case: file1 file2 ... fileN

Using --strip-components=3

azure blobxfer parameters [v0.9.9.9]

platform: Linux-2.6.32-431.20.3.el6.x86_64-x86_64-with-redhat-6.5-Carbon python interpreter: CPython 2.7.11 package versions: az.common=1.0.0 az.sml=0.20.1 az.stor=0.20.2 req=2.9.1 subscription id: None management cert: None transfer direction: local->Azure local resource: /data/groups/folder/archive/ include pattern: None remote resource: None max num of workers: 48 timeout: None storage account: XXX use SAS: False upload as page blob: False auto vhd->page blob: False container: folder blob container URI: https://XXX.blob.core.windows.net/folder compute file MD5: True skip on MD5 match: True chunk size (bytes): 4194304 create container: True keep mismatched MD5: False recursive if dir: True component strip on up: 3 remote delete: False collate to: disabled local overwrite: True encryption mode: disabled RSA key file: disabled RSA key type: disabled

script start time: 2016-02-01 10:03:40 computing file md5 on: /data/groups/folder/archive/file.tar.gz.split

md5: lhD55kDeLW9uh4PQ== detected 0 empty files to upload performing 25600 put blocks/blobs and 1 put block lists spawning 48 worker threads

xfer progress: [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 100.00% 793.60 blocks/min

102400.0 MiB transfered, elapsed 1935.4741931 sec. Throughput = 423.255449709 Mbit/sec

Is working fine at least for 1 file, but the structure on the storage is: archive/file1

Using --strip-components=4 I obtain the following:

azure blobxfer parameters [v0.9.9.9]

platform: Linux-2.6.32-431.20.3.el6.x86_64-x86_64-with-redhat-6.5-Carbon python interpreter: CPython 2.7.11 package versions: az.common=1.0.0 az.sml=0.20.1 az.stor=0.20.2 req=2.9.1 subscription id: None management cert: None transfer direction: local->Azure local resource: /data/groups/folder/archive/ include pattern: None remote resource: None max num of workers: 48 timeout: None storage account: XXX use SAS: False upload as page blob: False auto vhd->page blob: False container: folder blob container URI: https://XXX.blob.core.windows.net/folder compute file MD5: True skip on MD5 match: True chunk size (bytes): 4194304 create container: True keep mismatched MD5: False recursive if dir: True component strip on up: 4 remote delete: False collate to: disabled local overwrite: True encryption mode: disabled RSA key file: disabled RSA key type: disabled

script start time: 2016-02-01 09:58:24 computing file md5 on: /data/groups/folder/archive/file.tar.gz.split md5: lhD55kDeLW9uh4PXtJ7LhQ== detected 0 empty files to upload performing 25600 put blocks/blobs and 1 put block lists spawning 48 worker threads xfer progress: [ ] 0.00% 0.00 blocks/min Traceback (most recent call last): File "blobxfer_0.9.9.9.py", line 880, in run offset, bytestoxfer, encparam, flock, filedesc) File "blobxfer_0.9.9.9.py", line 1002, in putblobdata content_md5=contentmd5, timeout=self.timeout) File "blobxfer_0.9.9.9.py", line 1309, in azure_request return req(_args, *_kwargs) File "/cm/local/apps/python/2.7.11_azure/lib/python2.7/site-packages/azure/storage/blob/blobservice.py", line 2366, in put_block self._perform_request(request) File "/cm/local/apps/python/2.7.11_azure/lib/python2.7/site-packages/azure/storage/storageclient.py", line 178, in _perform_request _storage_error_handler(ex) File "/cm/local/apps/python/2.7.11_azure/lib/python2.7/site-packages/azure/storage/_serialization.py", line 25, in _storage_error_handler return _general_error_handler(http_error) File "/cm/local/apps/python/2.7.11_azure/lib/python2.7/site-packages/azure/storage/_common_error.py", line 82, in _general_error_handler raise AzureHttpError(message, http_error.status) AzureHttpError: The request body is too large and exceeds the maximum permissible limit. <?xml version="1.0" encoding="utf-8"?>RequestBodyTooLargeThe request body is too large and exceeds the maximum permissible limit. RequestId:c7662108-0001-0111-2ccf-5cb68d000000 Time:2016-02-01T09:02:45.8204723Z100000

Thanks and regards

alfpark commented 8 years ago

Unfortunately, I cannot reproduce this error.

Can you place this statement: print(offset, bytestoxfer, len(data)) prior to the azure_request function call on line 999? Please re-run your scenario and paste your output.

jmlero commented 8 years ago

The output:

detected 0 empty files to upload performing 25600 put blocks/blobs and 1 put block lists spawning 48 worker threads xfer progress: [ ] 0.00% 0.00 blocks/min 0 4194304 4194304 4194304 4194304 4194304 8388608 4194304 4194304 12582912 4194304 4194304 16777216 4194304 4194304 20971520 4194304 4194304 25165824 4194304 4194304 29360128 4194304 4194304 33554432 4194304 4194304 37748736 4194304 4194304 41943040 4194304 4194304 46137344 4194304 4194304 50331648 4194304 4194304 54525952 4194304 4194304 58720256 4194304 4194304 62914560 4194304 4194304 67108864 4194304 4194304 71303168 4194304 4194304 75497472 4194304 4194304 79691776 4194304 4194304 83886080 4194304 4194304 88080384 4194304 4194304 92274688 4194304 4194304 96468992 4194304 4194304 100663296 4194304 4194304 104857600 4194304 4194304 109051904 4194304 4194304 113246208 4194304 4194304 117440512 4194304 4194304 121634816 4194304 4194304 125829120 4194304 4194304 130023424 4194304 4194304 134217728 4194304 4194304 138412032 4194304 4194304 142606336 4194304 4194304 146800640 4194304 4194304 150994944 4194304 4194304 155189248 4194304 4194304 159383552 4194304 4194304 163577856 4194304 4194304 167772160 4194304 4194304 171966464 4194304 4194304 176160768 4194304 4194304 180355072 4194304 4194304 184549376 4194304 4194304 188743680 4194304 4194304 192937984 4194304 4194304 197132288 4194304 4194304 Traceback (most recent call last): File "/data/groups/adm_informatics/prod/archive2azure/blobxfer_0.9.9.9.py", line 880, in run offset, bytestoxfer, encparam, flock, filedesc) File "/data/groups/adm_informatics/prod/archive2azure/blobxfer_0.9.9.9.py", line 1003, in putblobdata content_md5=contentmd5, timeout=self.timeout) File "/data/groups/adm_informatics/prod/archive2azure/blobxfer_0.9.9.9.py", line 1310, in azure_request return req(_args, *_kwargs) File "/cm/local/apps/python/2.7.11_azure/lib/python2.7/site-packages/azure/storage/blob/blobservice.py", line 2366, in put_block self._perform_request(request) File "/cm/local/apps/python/2.7.11_azure/lib/python2.7/site-packages/azure/storage/storageclient.py", line 178, in _perform_request _storage_error_handler(ex) File "/cm/local/apps/python/2.7.11_azure/lib/python2.7/site-packages/azure/storage/_serialization.py", line 25, in _storage_error_handler return _general_error_handler(http_error) File "/cm/local/apps/python/2.7.11_azure/lib/python2.7/site-packages/azure/storage/_common_error.py", line 82, in _general_error_handler raise AzureHttpError(message, http_error.status) AzureHttpError: The request body is too large and exceeds the maximum permissible limit. <?xml version="1.0" encoding="utf-8"?>RequestBodyTooLargeThe request body is too large and exceeds the maximum permissible limit. RequestId:3309da30-0001-0000-78be-5dc7c3000000 Time:2016-02-02T13:33:05.8171503Z100000

alfpark commented 8 years ago

Thanks for running the script with the modification. According to the new debug lines, the data being sent to the Azure Python Storage SDK is consistent with the maximum allowable block size of 4MB.

I have two suggestions:

  1. Retry the upload using a SAS key so the request is a direct REST call via requests
  2. Pass --chunksizebytes 4194296 as a parameter
jmlero commented 8 years ago

Using the second option, pass --chunksizebytes 4194296 as a parameter works fine. I will try also using a SAS key, but I think that for me the best solution is change the value _MAX_BLOB_CHUNK_SIZE_BYTES to 4194296 as you suggested

If this error is related with the SDK, also I will try the next release of the SDK as soon as it will be available.

Thanks and regards

alfpark commented 8 years ago

Thanks for working through the issue. If you want, you can directly raise this issue to the Azure Python Storage SDK github repo. I will close this issue.