Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.32k stars 1.97k forks source link

[BUG] Concurrent file uploads fail with exception -- similar to issue 30735 #31400

Closed masokan closed 1 year ago

masokan commented 1 year ago

Describe the bug Concurrent file upload fails. Very similar to issue #30735

Exception or Stack Trace Please see https://github.com/masokan/azure-upload-bug/blob/main/run.log

To Reproduce See the github repository: https://github.com/masokan/azure-upload-bug

Code Snippet See https://github.com/masokan/azure-upload-bug

Expected behavior No exception during uploads

Screenshots None

Setup (please complete the following information):

If you suspect a dependency version mismatch (e.g. you see NoClassDefFoundError, NoSuchMethodError or similar), please check out Troubleshoot dependency version conflict article first. If it doesn't provide solution for the problem, please provide:

Additional context Add any other context about the problem here.

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

masokan commented 1 year ago

Updated checklist

joshfree commented 1 year ago

@jaschrep-msft could you follow up with @masokan on this github issue

masokan commented 1 year ago

Thanks @joshfree for prioritizing this issue. @jaschrep-msft if you need any further assistance, please let me know

jaschrep-msft commented 1 year ago

@masokan can you please provide logs of the requests going out on the wire, either via a proxy like Fiddler or Azure SDK's built-in logging? This error points to a flush request not matching the current state of the resource in the cloud, but you are using a method that manages flushes for you and does not have a history of getting it wrong.

I imagine there is a mismanagement of concurrency and/or resources somewhere in this reproduction, but it is difficult to see from looking. Given it only reproduces when thread count reaches a threshold, I imagine it is concurrency mismanagement.

Tangentially, spinning up threads to block on operations via our async clients eliminates the benefits of using the asynchronous clients. You're better off using sync clients in code like this or learning how to manage concurrency via reactor asynchronous APIs.

masokan commented 1 year ago

@jaschrep-msft Thanks for looking into the problem.

I enabled logging and captured more useful information. Please check https://github.com/masokan/azure-upload-bug/blob/main/run.log now.

I agree with you. I do not need async APIs to upload several files since I already have multiple threads for uploading. However, my real use case is to upload data using streaming API like:

upload(Flux data, ParallelTransferOptions parallelTransferOptions) in DataLakeFileAsyncClient

since I do not know the sizes of individual streams upfront. The synchronous counterpart in DataLakeFileClient namely

upload(InputStream data, long length)

and its siblings require the size of the stream upfront. So I could not use them.

The asynchronous streaming API calls also fail in the same way. I wanted to provide you with a simpler program and that is the reason why I opted for async file APIs in my program.

Let me know if there is anything else you need.

jaschrep-msft commented 1 year ago

@masokan my apologies. It seeps Azure SDK for Java currently requires certain environment variables to be set for HTTP requests/responses to be logged. Setting AZURE_HTTP_LOG_DETAIL_LEVEL environment variable to headers should ensure what I'm looking for is logged.

I've reached out to our docs writers to ensure better public documentation around this.

Could you please rerun your reproduction with these logs enabled? I also notice that the most recent logs you've posted show a timeout exception, instead of the original issue where the flush was invalid. I need a reproduction containing the original issue to diagnose it; a timeout will hide the actual problem.

masokan commented 1 year ago

@jaschrep-msft no worries! I needed to spend some time to get the log without the timeout error. I enabled the log level by setting the env. variable you suggested. I have uploaded the new version of the log: https://github.com/masokan/azure-upload-bug/blob/main/run.log

Some other observation that you noticed as well: The problem occurs only when there is good concurrency in the system. In other words, if you have only 4 cores in the system uploading 10 files concurrently succeeds. However, if you upload only 4 files concurrently the problem happens. I found this out when I ran my program on different systems. Anyway, for the final run I cut down the number of parallel uploads to 4 which presumably cuts down the log size as well.

I know debugging multi-threaded program is not easy. Hope the new log helps you somewhat.

jaschrep-msft commented 1 year ago

@masokan thanks! I will begin investigating these logs to see if I can piece together an order of events.

One thing, have you done anything to sterilize these logs, either in setting up logging or post-processing? I'm noticing that every URL query value is redacted, e.g. https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=REDACTED&position=REDACTED&retainUncommittedData=REDACTED&close=REDACTED. These are arguments to the operation, and it will be difficult to parse what's happening without them. Specifically, action is more or less required to understand what's happening. I can begin to build a picture, and I can poke around on my end to see if there is something in our system that can potentially trigger this sort of logging behavior.

Apologies this is taking such back and forth.

masokan commented 1 year ago

@jaschrep-msft thanks for your response. I will check whether I can give you the log without any redactions. But you should be able to run the test yourself (my github repository has the full source code and the maven build.) Let me know if there is any problem.

jaschrep-msft commented 1 year ago

@masokan I cannot reproduce the issue locally with your repository, which is why I must ask for your reproduction. It is also why I believe this to be a concurrency issue, though the source has yet to be determined.

There is currently a misconfiguration in our logging we need to address. In the meantime, @alzimmermsft can you post some quick instructions here on how the customer can override these settings?

alzimmermsft commented 1 year ago

HTTP requests and responses redact headers and query parameters based on an allow list configured when creating the client, in this case DataLakeFileSystemClient. These lists by default are very restrictive but can be configured by passing a configured HttpLogOptions to the builder, the following is an example:

serviceClientBuilder = new DataLakeServiceClientBuilder()
    .endpoint(endpoint)
    .credential(credential)
    .httpLogOptions(new HttpLogOptions()
        .setLogLevel(HttpLogDetailLevel.HEADERS) // Same as setting the environment configuration AZURE_HTTP_LOG_DETAIL_LEVEL to headers as @jaschrep-msft recommended
        .setAllowedHeaderNames(/* set of case-insensitive header names to log */)
        .setAllowedQueryParamNames(/* set of case-insensitive query parameter names to log */)
    );
masokan commented 1 year ago

@alzimmermsft can you please spell out all the header names and query parameter names that I should specify. I am not familiar with the ones you are looking for. Thanks.

masokan commented 1 year ago

@alzimmermsft never mind. I uploaded a new log with all header names and query parameters enabled. @jaschrep-msft you can check the new log at: https://github.com/masokan/azure-upload-bug/blob/main/run.log

jaschrep-msft commented 1 year ago

can you please spell out all the header names and query parameter names that I should specify. I am not familiar with the ones you are looking for. Thanks.

@masokan the following query parameters are getting in the way of understanding your logs

Again apologies for this confusion. We are investigating the incorrect default on our end.

masokan commented 1 year ago

@jaschrep-msft uploaded the new log with all the query parameters you asked for. Please check the log at: https://github.com/masokan/azure-upload-bug/blob/main/run.log. Hope it helps with debugging.

jaschrep-msft commented 1 year ago

@masokan You are mismanaging either your naming generation or your clients. This exception is occurring because you are trying to write to the same file 4 times in parallel. I imagine it doesn't happen on lower concurrencies because the race condition simply has no time to go off. Also explains why it struggles to reproduce on another machine. The specific error occurs on the flush step, where it tries to commit data to the file based on an offset that has been written up to, but another parallel operation has actually changed the state of the file, and now the offset is incorrect, which the service identifies and so fails.

Looking at the logs, you are not actually uploading to different blobs or directories. Below are the logs you provided, scoped down to info entries and your custom logged text. I skipped several entries at the start as they are irrelevant setup. Every request goes to the resource myname%2Fdir_3%2Ff_3 (unsure why url encoded slashes, but the service should decode those as desired) in the filesystem mycompany, despite your logging stating it's targeting /mycompnay/myname/dir_0/f_0, /mycompnay/myname/dir_3/f_3, /mycompnay/myname/dir_2/f_2, and /mycompnay/myname/dir_1/f_1.

Creating local files
Creating upload requests
Waiting for upload requests to finish
 INFO [parallel-10] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PUT","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?resource=file","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"0ca02a04-4148-4f56-828f-78adc42fcd0b","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [parallel-11] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PUT","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?resource=file","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"7a6b64b5-9451-4e80-90a5-12eea385b864","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [parallel-9] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PUT","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?resource=file","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"742991e5-5ee2-4c4a-bb93-c71231ee2880","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [parallel-12] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PUT","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?resource=file","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"b8698439-4de2-4205-820d-9c72f857fa44","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [reactor-http-epoll-1] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":201,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?resource=file","durationMs":30,"Last-Modified":"Mon, 31 Oct 2022 20:32:44 GMT","Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"268f76f3-301f-002a-3367-ed28ae000000","x-ms-version":"REDACTED","x-ms-client-request-id":"0ca02a04-4148-4f56-828f-78adc42fcd0b","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"0","eTag":"0x8DABB7F109D6C00"}
 INFO [reactor-http-epoll-2] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":201,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?resource=file","durationMs":34,"Last-Modified":"Mon, 31 Oct 2022 20:32:44 GMT","Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"dd4b12e2-901f-0023-7567-ed6d7d000000","x-ms-version":"REDACTED","x-ms-client-request-id":"7a6b64b5-9451-4e80-90a5-12eea385b864","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"0","eTag":"0x8DABB7F109DE119"}
 INFO [parallel-5] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PATCH","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=append&position=0","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"10240","x-ms-version":"REDACTED","Content-Type":"application/octet-stream","x-ms-client-request-id":"317cf30e-351e-45f4-b537-ebe8fa8de4b9","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":10240}
 INFO [parallel-7] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PATCH","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=append&position=0","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"10240","x-ms-version":"REDACTED","Content-Type":"application/octet-stream","x-ms-client-request-id":"fbd1ab61-abd6-4f5a-a8a1-d6491b794cc2","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":10240}
 INFO [reactor-http-epoll-2] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":202,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=append&position=0","durationMs":30,"Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"dd4b12e3-901f-0023-7667-ed6d7d000000","x-ms-version":"REDACTED","x-ms-client-request-id":"fbd1ab61-abd6-4f5a-a8a1-d6491b794cc2","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"0"}
 INFO [reactor-http-epoll-1] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":202,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=append&position=0","durationMs":34,"Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"268f76f4-301f-002a-3467-ed28ae000000","x-ms-version":"REDACTED","x-ms-client-request-id":"317cf30e-351e-45f4-b537-ebe8fa8de4b9","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"0"}
 INFO [parallel-9] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PATCH","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"151a778c-a9ff-4f9e-b59f-5b0555c59975","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [parallel-10] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PATCH","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"668f62c0-8c77-4485-b583-fddaffb44f34","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [reactor-http-epoll-4] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":201,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?resource=file","durationMs":86,"Last-Modified":"Mon, 31 Oct 2022 20:32:44 GMT","Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"06b06a5a-401f-006d-6b67-ed43f5000000","x-ms-version":"REDACTED","x-ms-client-request-id":"b8698439-4de2-4205-820d-9c72f857fa44","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"0","eTag":"0x8DABB7F10A61D47"}
 INFO [parallel-1] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PATCH","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=append&position=0","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"10240","x-ms-version":"REDACTED","Content-Type":"application/octet-stream","x-ms-client-request-id":"bd3a3b79-de3a-4ad5-9f15-232980227d7b","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":10240}
 INFO [reactor-http-epoll-3] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":201,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?resource=file","durationMs":92,"Last-Modified":"Mon, 31 Oct 2022 20:32:44 GMT","Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"74b80168-201f-007b-1667-edb522000000","x-ms-version":"REDACTED","x-ms-client-request-id":"742991e5-5ee2-4c4a-bb93-c71231ee2880","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"0","eTag":"0x8DABB7F10A6E076"}
 INFO [parallel-3] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PATCH","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=append&position=0","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"10240","x-ms-version":"REDACTED","Content-Type":"application/octet-stream","x-ms-client-request-id":"cb7d26f0-e60e-4dfc-a84f-6a36c561f5aa","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":10240}
 INFO [reactor-http-epoll-1] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"284","statusCode":400,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","durationMs":17,"Content-Type":"application/json;charset=utf-8","Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-error-code":"REDACTED","x-ms-request-id":"268f76f5-301f-002a-3567-ed28ae000000","x-ms-version":"REDACTED","x-ms-client-request-id":"668f62c0-8c77-4485-b583-fddaffb44f34","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"284"}
 INFO [reactor-http-epoll-2] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":202,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=append&position=0","durationMs":20,"Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"dd4b12e4-901f-0023-7767-ed6d7d000000","x-ms-version":"REDACTED","x-ms-client-request-id":"bd3a3b79-de3a-4ad5-9f15-232980227d7b","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"0"}
 INFO [parallel-5] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PATCH","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"ab99c06b-30f6-4eaa-af57-fbc5b7bf2a7b","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [reactor-http-epoll-4] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":202,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=append&position=0","durationMs":17,"Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"06b06a5b-401f-006d-6c67-ed43f5000000","x-ms-version":"REDACTED","x-ms-client-request-id":"cb7d26f0-e60e-4dfc-a84f-6a36c561f5aa","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"0"}
 INFO [parallel-7] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PATCH","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:44 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"23f7725a-1a0e-45a1-bf9b-ddf9e4c5b6e8","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [reactor-http-epoll-1] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"193","statusCode":500,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","durationMs":22,"Content-Type":"application/json;charset=utf-8","Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-error-code":"REDACTED","x-ms-request-id":"268f76f6-301f-002a-3667-ed28ae000000","x-ms-version":"REDACTED","x-ms-client-request-id":"23f7725a-1a0e-45a1-bf9b-ddf9e4c5b6e8","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"193"}
 INFO [reactor-http-epoll-3] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":200,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","durationMs":29,"Last-Modified":"Mon, 31 Oct 2022 20:32:44 GMT","Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"74b80169-201f-007b-1767-edb522000000","x-ms-version":"REDACTED","x-ms-client-request-id":"ab99c06b-30f6-4eaa-af57-fbc5b7bf2a7b","Date":"Mon, 31 Oct 2022 20:32:43 GMT","content-length":"0","eTag":"0x8DABB7F10AD482C"}
Finished uploading /home/myname/expt/Cloud-Archived/azure-bug-31400/f_0 to /mycompnay/myname/dir_0/f_0
Deleting local file /home/myname/expt/Cloud-Archived/azure-bug-31400/f_0
 INFO [reactor-http-epoll-5] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":200,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","durationMs":122,"Last-Modified":"Mon, 31 Oct 2022 20:32:44 GMT","Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"9af349dd-d01f-001d-5c67-edfa02000000","x-ms-version":"REDACTED","x-ms-client-request-id":"151a778c-a9ff-4f9e-b59f-5b0555c59975","Date":"Mon, 31 Oct 2022 20:32:44 GMT","content-length":"0","eTag":"0x8DABB7F10B0CA19"}
Finished uploading /home/myname/expt/Cloud-Archived/azure-bug-31400/f_3 to /mycompnay/myname/dir_3/f_3
Deleting local file /home/myname/expt/Cloud-Archived/azure-bug-31400/f_3
 INFO [parallel-9] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"PATCH","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","tryCount":"2","Date":"Mon, 31 Oct 2022 20:32:48 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"23f7725a-1a0e-45a1-bf9b-ddf9e4c5b6e8","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [reactor-http-epoll-2] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":200,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname%2Fdir_3%2Ff_3?action=flush&position=10240&retainUncommittedData=false&close=false","durationMs":25,"Last-Modified":"Mon, 31 Oct 2022 20:32:48 GMT","Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-server-encrypted":"REDACTED","x-ms-request-id":"dd4b12e9-901f-0023-7c67-ed6d7d000000","x-ms-version":"REDACTED","x-ms-client-request-id":"23f7725a-1a0e-45a1-bf9b-ddf9e4c5b6e8","Date":"Mon, 31 Oct 2022 20:32:47 GMT","content-length":"0","eTag":"0x8DABB7F1313DBBA"}
Finished uploading /home/myname/expt/Cloud-Archived/azure-bug-31400/f_2 to /mycompnay/myname/dir_2/f_2
Deleting local file /home/myname/expt/Cloud-Archived/azure-bug-31400/f_2
Exception occurred for: Local file:/home/myname/expt/Cloud-Archived/azure-bug-31400/f_1 Remote file:/mycompnay/myname/dir_1/f_1
 INFO [parallel-11] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP request","method":"DELETE","url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname?recursive=true","tryCount":"1","Date":"Mon, 31 Oct 2022 20:32:48 GMT","Authorization":"REDACTED","Content-Length":"0","x-ms-version":"REDACTED","x-ms-client-request-id":"13c6b310-fd58-408b-a121-b72cc2391a5f","Accept":"application/json","User-Agent":"azsdk-java-azure-storage-file-datalake/12.12.1 (11.0.16; Linux; 5.15.0-52-generic)","contentLength":0}
 INFO [reactor-http-epoll-4] (LoggingEventBuilder.java:358) - {"az.sdk.message":"HTTP response","contentLength":"0","statusCode":200,"url":"https://mycompnay.dfs.core.windows.net/mycompnay/myname?recursive=true","durationMs":31,"Server":"Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0","x-ms-request-id":"06b06a74-401f-006d-0567-ed43f5000000","x-ms-version":"REDACTED","x-ms-client-request-id":"13c6b310-fd58-408b-a121-b72cc2391a5f","Date":"Mon, 31 Oct 2022 20:32:47 GMT","content-length":"0"}
Exception in thread "main" DEBUG [reactor-http-epoll-4] (Loggers.java:245) - [4652cb01-3, L:/10.2.193.133:57472 - R:mycompnay.dfs.core.windows.net/20.60.62.2:443] Received last HTTP packet
java.lang.Exception: com.azure.storage.file.datalake.models.DataLakeStorageException: Status code 400, "{"error":{"code":"InvalidFlushPosition","message":"The uploaded data is not contiguous or the position query parameter value is not equal to the length of the file after appending the uploaded data.\nRequestId:268f76f5-301f-002a-3567-ed28ae000000\nTime:2022-10-31T20:32:44.1034804Z"}}"
masokan commented 1 year ago

@jaschrep-msft thanks for your findings. I will do some investigation and get back to you.

masokan commented 1 year ago

@jaschrep-msft Upon further investigation, I found that for each file to be uploaded, you need a new instance of DataLakePathClientBuilder which is not very intuitive.

I have used a single instance of DataLakePathClientBuilder and called buildFileAsyncClient() method multiple times to get an instance of DataLakeFileAsyncClient for each file to be uploaded. Note that while building each instance of DataLakeFileAsyncClient, I passed different path names assuming that DataLakeFileAsyncClient instances do not share anything and are thread-safe.

What is happening is that the path names get overwritten and only the last name survives. This is the reason why uploaded data through each DataLakeFileAsyncClient goes to the same file simultaneously in different threads resulting in exceptions thrown.

I think this is either an API design issue or a bug in the SDK.

In any case, I am getting around my problem for now. Thanks for all the debugging.

ibrahimrabab commented 1 year ago

Closing issue since customer has found a workaround.

@masokan please feel free to reach out with additional questions :)