conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.3k stars 986 forks source link

[bug] conan upload fails with timeout #17242

Open KozhinovAlexander opened 4 weeks ago

KozhinovAlexander commented 4 weeks ago

Describe the bug

Environment details: ubuntu linux 20.04, conan v1.65

The size of the created ´my_package´ is about 8 GiBi.

Dear community - trying to upload my newly creted conan package to JFrog artifactory I get http request timeout:

[2024-10-29T09:15:22.878Z] Execute: conan upload my_package/1.2.3@foo/bar --all --remote=my_dev_remote --force
[2024-10-29T09:15:23.140Z] Uploading to remote 'my_dev_remote':
Uploading my_package/1.2.3@foo/bar to remote 'my_dev_remote'
[2024-10-29T09:15:23.140Z] Compressing recipe...
[2024-10-29T09:15:24.183Z] Uploading conan_export.tgz -> my_package/1.2.3@foo/bar
[2024-10-29T09:15:24.183Z] Uploading conanfile.py -> my_package/1.2.3@foo/bar
[2024-10-29T09:15:24.183Z] Uploading conanmanifest.txt -> my_package/1.2.3@foo/bar
Uploaded conan recipe 'my_package/1.2.3@foo/bar' to 'my_dev_remote': https://my.repo.server.net/artifactory/api/conan/my_dev_remote
Uploading package 1/1: ef059320de0a5d5ea7a1532cacb74bb051c7bc6a to 'my_dev_remote'
[2024-10-29T09:15:24.184Z] Compressing package...
[2024-10-29T10:22:24.274Z] Uploading conan_package.tgz -> my_package/1.2.3@foo/bar:ef05
[2024-10-29T10:27:47.664Z] conan_upload - ERROR - ERROR: HTTPSConnectionPool(host='my.repo.server.net', port=xxx): Max retries exceeded with url: /artifactory/api/conan/my_dev_remote/v1/files/foo/my_package/1.2.3/bar/0/package/ef059320de0a5d5ea7a1532cacb74bb051c7bc6a/0/conan_package.tgz (Caused by ReadTimeoutError("HTTPSConnectionPool(host='my.repo.server.net', port=xxx): Read timed out. (read timeout=60.0)"))
[2024-10-29T10:27:47.664Z] Waiting 5 seconds to retry...
[2024-10-29T10:32:02.882Z] Error uploading file: conan_package.tgz, 'HTTPSConnectionPool(host='my.repo.server.net', port=xxx): Max retries exceeded with url: /artifactory/api/conan/my_dev_remote/v1/files/foo/my_package/1.2.3/bar/0/package/ef059320de0a5d5ea7a1532cacb74bb051c7bc6a/0/conan_package.tgz (Caused by ReadTimeoutError("HTTPSConnectionPool(host='my.repo.server.net', port=xxx): Read timed out. (read timeout=60.0)"))'
[2024-10-29T10:32:02.882Z] Uploading conaninfo.txt -> my_package/1.2.3@foo/bar:ef05
[2024-10-29T10:32:02.882Z] Uploading conanmanifest.txt -> my_package/1.2.3@foo/bar:ef05
[2024-10-29T10:32:02.883Z] conan_upload - ERROR - ERROR: my_package/1.2.3@foo/bar:ef059320de0a5d5ea7a1532cacb74bb051c7bc6a: Upload package to 'my_dev_remote' failed: Execute upload again to retry upload the failed files: conan_package.tgz. [Remote: my_dev_remote]
[2024-10-29T10:32:02.883Z] conan_upload - ERROR -
[2024-10-29T10:32:02.883Z] ERROR - Something went wrong while conan upload with ('Errors uploading some packages',)
[2024-10-29T10:32:02.883Z] Traceback (most recent call last):
[2024-10-29T10:32:02.883Z]   File "/usr/local/bin/myciuser", line 8, in <module>
[2024-10-29T10:32:02.883Z]     sys.exit(myciuser())
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/typer/main.py", line 338, in __call__
[2024-10-29T10:32:02.883Z]     raise e
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/typer/main.py", line 321, in __call__
[2024-10-29T10:32:02.883Z]     return get_command(self)(*args, **kwargs)
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
[2024-10-29T10:32:02.883Z]     return self.main(*args, **kwargs)
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/typer/core.py", line 728, in main
[2024-10-29T10:32:02.883Z]     return _main(
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/typer/core.py", line 197, in _main
[2024-10-29T10:32:02.883Z]     rv = self.invoke(ctx)
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
[2024-10-29T10:32:02.883Z]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
[2024-10-29T10:32:02.883Z]     return ctx.invoke(self.callback, **ctx.params)
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
[2024-10-29T10:32:02.883Z]     return __callback(*args, **kwargs)
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/typer/main.py", line 703, in wrapper
[2024-10-29T10:32:02.883Z]     return callback(**use_params)
[2024-10-29T10:32:02.883Z]   File "/usr/local/lib/python3.10/site-packages/myciuser/commands/upload.py", line 70, in upload
[2024-10-29T10:32:02.883Z]     ConanCMD.conan_upload(config=config, username=username, password=password)
[2024-10-29T10:32:02.884Z]   File "/usr/local/lib/python3.10/site-packages/myciuser/helpers/conan/conan_cmd.py", line 553, in conan_upload
[2024-10-29T10:32:02.884Z]     ConanCMD._upload_build_artifacts(
[2024-10-29T10:32:02.884Z]   File "/usr/local/lib/python3.10/site-packages/myciuser/helpers/conan/conan_cmd.py", line 402, in _upload_build_artifacts
[2024-10-29T10:32:02.884Z]     raise ex
[2024-10-29T10:32:02.884Z]   File "/usr/local/lib/python3.10/site-packages/myciuser/helpers/conan/conan_cmd.py", line 393, in _upload_build_artifacts
[2024-10-29T10:32:02.884Z]     ConanAPIV1(output=conan_logger).upload(  # type: ignore
[2024-10-29T10:32:02.884Z]   File "/usr/local/lib/python3.10/site-packages/conans/client/conan_api.py", line 93, in wrapper
[2024-10-29T10:32:02.884Z]     return f(api, *args, **kwargs)
[2024-10-29T10:32:02.884Z]   File "/usr/local/lib/python3.10/site-packages/conans/client/conan_api.py", line 1012, in upload
[2024-10-29T10:32:02.884Z]     uploader.upload(pattern, remotes, upload_recorder, package, all_packages, confirm,
[2024-10-29T10:32:02.884Z]   File "/usr/local/lib/python3.10/site-packages/conans/client/cmd/uploader.py", line 496, in upload
[2024-10-29T10:32:02.884Z]     raise ConanException("Errors uploading some packages")
[2024-10-29T10:32:02.884Z] conans.errors.ConanException: Errors uploading some packages
script returned exit code 1

Interestingly all other packages are working fine

How to reproduce it

The simpliest way to reproduce is to take a big file and package it with conan, and then upload to your artifactory.

memsharded commented 4 weeks ago

Hi @KozhinovAlexander

Thanks for your report. What kind of package is this? It seems too large for any C or C++ package, even some of the largest packages out there are not that large.

There is a conf in Conan 1:

[general]
request_timeout = 60    

That you could increase to try to pass that issue (it is in seconds) However, I am not sure this would be enough or the root cause. Because that timeout is the time without a response from the server, not the total upload time. So even for large uploads, the server shouldn't take like 60 seconds to communicate with the client.

This could be caused by other limits in Artifactory, I think it has configuration for maximum file size, for example for Cloud the max is 25Gb. If you are using some pro version, I'd suggest to reach out to their support and check these things.

KozhinovAlexander commented 3 weeks ago

Hi @KozhinovAlexander

Thanks for your report. What kind of package is this? It seems too large for any C or C++ package, even some of the largest packages out there are not that large.

Well, this is a monolithic bigger package, which bundles multiple packages to one single delivery blob. True - there is some missue. But unfortunately it was done before me and it can not be reverted as fast as I wish 😃

There is a conf in Conan 1:

[general]
request_timeout = 60    

That you could increase to try to pass that issue (it is in seconds) However, I am not sure this would be enough or the root cause. Because that timeout is the time without a response from the server, not the total upload time. So even for large uploads, the server shouldn't take like 60 seconds to communicate with the client.

Interestingly in my case as I vary request_timeout parameter from 5 sec to 150 sec I can see the T value (see screenshot below) variing accordingly, while transfer windows (red pulses) stay same of around 30sec.: conan_upload_timeout_correlation_0

For me it looks like the corresponding request function checks for request at the end of the timeout. In my opinion it is wrong understanding of timeout. The request checks shall happen faster (maybe ~200ms) and the function shall timeout after request_timeout value approached.

On the other hand I do not know, if artifactory server or network will kick conan client out, if it polls status every ~200ms.

What are your thoughts here?

This could be caused by other limits in Artifactory, I think it has configuration for maximum file size, for example for Cloud the max is 25Gb. If you are using some pro version, I'd suggest to reach out to their support and check these things.

This one I need to check.

memsharded commented 3 weeks ago

For me it looks like the corresponding request function checks for request at the end of the timeout. In my opinion it is wrong understanding of timeout. The request checks shall happen faster (maybe ~200ms) and the function shall timeout after request_timeout value approached.

This is not a definition of timeout that we invented, it is the timeout as defined by the python-requests popular library, see https://requests.readthedocs.io/en/latest/user/advanced/#timeouts. Conan does nothing with this value, just pass it to python-requests.

On the other hand I do not know, if artifactory server or network will kick conan client out, if it polls status every ~200ms.

What are your thoughts here?

I tend to think this is some kind of resource exhaustion, on server side, network layer (sometimes, specially in corporate networks with firewalls, vpns, proxies, etc, we have seen intermediate layers imposing limits).

KozhinovAlexander commented 3 weeks ago

For me it looks like the corresponding request function checks for request at the end of the timeout. In my opinion it is wrong understanding of timeout. The request checks shall happen faster (maybe ~200ms) and the function shall timeout after request_timeout value approached.

This is not a definition of timeout that we invented, it is the timeout as defined by the python-requests popular library, see https://requests.readthedocs.io/en/latest/user/advanced/#timeouts. Conan does nothing with this value, just pass it to python-requests.

Ok. But it definitely impacts transfer performance. For example setting request_timeout to 10 sec. will produce 30 sec. upload and 10 sec. dead (timeout) time. Increasing request_timeout value will increase dead time accordingly (see T value in picture below). We can see now, the timeout effectively reduce a throughout. Of course it can be seen for larger packages (around 300MiBi) in total. But the behavior is there. My proposal would be: passing a value of about 200ms as a timeout to python-requests library and doing as many 200ms requests as matches into request_timeout value. In that case the upload of larger packages shall be much more performant. Also the backward compatibility shall be perfect. @memsharded Could you imagine such change?

On the other hand I do not know, if artifactory server or network will kick conan client out, if it polls status every ~200ms.

What are your thoughts here?

I tend to think this is some kind of resource exhaustion, on server side, network layer (sometimes, specially in corporate networks with firewalls, vpns, proxies, etc, we have seen intermediate layers imposing limits).

Agree. It is highly likely the problem. I investigate it with server folks.

memsharded commented 3 weeks ago

@memsharded Could you imagine such change?

I am not sure about this. The timeouts only happen under some circumstances, where the server doesn't really respond. But this is very rarely the case. Conan is not waiting 60 seconds (the default right now) after each upload. This timeout is not something we control, so there is no way to set up a 200ms timeout then have python-requests do 200ms requests, this is internal to python-requests.

Also, we try to follow as much as possible the "experts" defaults. In this case the python-requests default is None which means wait for ever. We reduced that to 60 seconds, which seems a reasonable limit for any server. But trying to set it to something less than a few seconds might fail very often even for quite performant servers, note these servers load might be huge.

Agree. It is highly likely the problem. I investigate it with server folks.

Yes, in any case, the performance of transfers should be a separate discussion if anything, lets keep this ticket focused on the timeout.

KozhinovAlexander commented 3 weeks ago

@memsharded Do you have a jfrog artifactory server to reproduce the behavior? In that case I could provide you simple conan package, which reproduces mentioned timeout behavior, s.t. you see timeout impacts transfer performance.

memsharded commented 3 weeks ago

For testing I usually use a server running locally in my computer, but that might not be the best to test real network issues.