Azure / azure-storage-python

Microsoft Azure Storage Library for Python
https://azure-storage.readthedocs.io
MIT License
339 stars 241 forks source link

create_file_from_path of fileservice overwrites SMBproperties with defaults #684

Open vladm opened 3 years ago

vladm commented 3 years ago

Which service(blob, file, queue) does this issue concern?

azure.storage.file

Which version of the SDK was used? Please provide the output of pip freeze.

2.1.0

What problem was encountered?

I'm using create_file_from_path from azure/storage/file/fileservice.py to create a small test file. I'm passing last_write_time and create_time via SMBProperties as I need to preserve those values.

However file is created with current date/time stamp, when examined via Azure Storage Explorer.

Looking at the log, I can see that for some reason, there are two PUT requests made:

1st PUT that passes the correct last_write_time and create_tim via x-ms-file-creation-time and x-ms-file-last-write-time header fields. Based on the call stack, I can see it's coming via create_file call within create_file_from_stream.

2nd PUT that strips off all the SMBProperties and does not include corresponding fileds in the HTTP PUT request coming from within process_chunk.

Here is full log for your reference:

[2020-12-08 16:06:05,864] {bbm_sftp2afs.py:154} INFO - Uploading /tmp/tmpl13zcmod to afs://airflowtest/postlogs as sample.csv and modification time 2020-05-08T09:49:58.0000000Z [2020-12-08 16:06:05,889] {storageclient.py:331} INFO - Client-Request-ID=91851bce-39a1-11eb-9ac4-50e549edaf57 Outgoing request: Method=PUT, Path=/airflowtest/postlogs/sample.csv, Query={'timeout': None}, Headers={'x-ms-content-length': '8737', 'x-ms-type': 'file', 'x-ms-file-permission': 'Inherit', 'x-ms-file-attributes': 'Archive', 'x-ms-file-creation-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-last-write-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-permission-key': None, 'x-ms-version': '2019-02-02', 'User-Agent': 'Azure-Storage/2.1.0-2.1.0 (Python CPython 3.6.9; Linux 4.4.0-19041-Microsoft)', 'x-ms-client-request-id': '91851bce-39a1-11eb-9ac4-50e549edaf57', 'x-ms-date': 'Tue, 08 Dec 2020 22:06:05 GMT', 'Authorization': 'REDACTED'}. [2020-12-08 16:06:05,890] {storageclient.py:332} INFO - Outgoing request STACK: [2020-12-08 16:06:05,898] {storageclient.py:334} INFO - File "bin/airflow", line 37, in args.func(args) [2020-12-08 16:06:05,898] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/utils/cli.py", line 80, in wrapper return f(*args, kwargs) [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 580, in run _run(args, dag, ti) [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 476, in _run run_job.run() [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/jobs/base_job.py", line 218, in run self._execute() [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/jobs/local_task_job.py", line 94, in _execute self.task_runner.start() [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/task/task_runner/standard_task_runner.py", line 43, in start self.process = self._start_by_fork() [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/task/task_runner/standard_task_runner.py", line 86, in _start_by_fork args.func(args, dag=self.dag) [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/utils/cli.py", line 80, in wrapper return f(*args, *kwargs) [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 580, in run _run(args, dag, ti) [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/bin/cli.py", line 481, in _run pool=args.pool, [2020-12-08 16:06:05,899] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(args, kwargs) [2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task result = task_copy.execute(context=context) [2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "/home/vlad/airflow/plugins/operators/bbm_sftp2afs.py", line 157, in execute creation_time=self.afs_load_options['creation_time']) [2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/airflow/contrib/hooks/azure_fileshare_hook.py", line 172, in load_file file_name, file_path, *kwargs) [2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 1943, in create_file_from_path max_connections, file_permission=file_permission, smb_properties=smb_properties, timeout=timeout) [2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 2134, in create_file_from_stream timeout=timeout [2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 1888, in create_file self._perform_request(request) [2020-12-08 16:06:05,900] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/common/storageclient.py", line 333, in _perform_request for line in traceback.format_stack(): [2020-12-08 16:06:06,142] {storageclient.py:357} INFO - Client-Request-ID=91851bce-39a1-11eb-9ac4-50e549edaf57 Receiving Response: Server-Timestamp=Tue, 08 Dec 2020 22:06:11 GMT, Server-Request-ID=62886ed3-701a-0065-50ae-cd92db000000, HTTP Status Code=201, Message=Created, Headers={'content-length': '0', 'last-modified': 'Fri, 08 May 2020 09:49:58 GMT', 'etag': '"0x8D7F3352B362700"', 'server': 'Windows-Azure-File/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-request-id': '62886ed3-701a-0065-50ae-cd92db000000', 'x-ms-client-request-id': '91851bce-39a1-11eb-9ac4-50e549edaf57', 'x-ms-version': '2019-02-02', 'x-ms-file-change-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-last-write-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-creation-time': '2020-05-08T09:49:58.0000000Z', 'x-ms-file-permission-key': '106997116661656136764385920356675382498', 'x-ms-file-attributes': 'Archive', 'x-ms-file-id': '13835064652351930368', 'x-ms-file-parent-id': '13835128424026341376', 'x-ms-request-server-encrypted': 'true', 'date': 'Tue, 08 Dec 2020 22:06:11 GMT'}.

[2020-12-08 16:06:06,143] {storageclient.py:331} INFO - Client-Request-ID=91ac43ac-39a1-11eb-af20-50e549edaf57 Outgoing request: Method=PUT, Path=/airflowtest/postlogs/sample.csv, Query={'comp': 'range', 'timeout': None}, Headers={'x-ms-write': 'update', 'x-ms-range': 'bytes=0-8736', 'Content-Length': '8737', 'x-ms-version': '2019-02-02', 'User-Agent': 'Azure-Storage/2.1.0-2.1.0 (Python CPython 3.6.9; Linux 4.4.0-19041-Microsoft)', 'x-ms-client-request-id': '91ac43ac-39a1-11eb-af20-50e549edaf57', 'x-ms-date': 'Tue, 08 Dec 2020 22:06:06 GMT', 'Authorization': 'REDACTED'}. [2020-12-08 16:06:06,143] {storageclient.py:332} INFO - Outgoing request STACK: [2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() [2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() [2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, *self._kwargs) [2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/concurrent/futures/thread.py", line 69, in _worker work_item.run() [2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run result = self.fn(self.args, **self.kwargs) [2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/_upload_chunking.py", line 82, in process_chunk return self._upload_chunk_with_progress(chunk_offset, chunk_data) [2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/_upload_chunking.py", line 129, in _upload_chunk_with_progress timeout=self.timeout [2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/file/fileservice.py", line 2691, in update_range self._perform_request(request) [2020-12-08 16:06:06,145] {storageclient.py:334} INFO - File "lib/python3.6/site-packages/azure/storage/common/storageclient.py", line 333, in _perform_request for line in traceback.format_stack(): [2020-12-08 16:06:06,194] {storageclient.py:357} INFO - Client-Request-ID=91ac43ac-39a1-11eb-af20-50e549edaf57 Receiving Response: Server-Timestamp=Tue, 08 Dec 2020 22:06:11 GMT, Server-Request-ID=62886ed7-701a-0065-51ae-cd92db000000, HTTP Status Code=201, Message=Created, Headers={'content-length': '0', 'content-md5': 'e8Xp61MKK0lkStIaa2iwuw==', 'last-modified': 'Tue, 08 Dec 2020 22:06:11 GMT', 'etag': '"0x8D89BC578FC2B35"', 'server': 'Windows-Azure-File/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-request-id': '62886ed7-701a-0065-51ae-cd92db000000', 'x-ms-client-request-id': '91ac43ac-39a1-11eb-af20-50e549edaf57', 'x-ms-version': '2019-02-02', 'x-ms-request-server-encrypted': 'true', 'date': 'Tue, 08 Dec 2020 22:06:11 GMT'}.