conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.22k stars 979 forks source link

[bug] Enabling parallel download/upload cause majority of our build to fail. #6683

Open fulara opened 4 years ago

fulara commented 4 years ago

Is this issue on the artifactory side rather than conan ? I dont know. I have enabled parallel download and upload, on both of these we are getting random failures.

To provide more inforamtion, because some of the errors refers to certificates we provide a custom certificate that usually seems to be working.

Environment Details (include every applicable attribute)

How do we provide conan

We bundle conan using pyinstaller, on rh6 machine roughly following this procedure here: https://github.com/conan-io/conan/issues/6325 We then distribute that pyinstalled package across and just use it as is.

Do we use download cache?

No, we download everything from scratch everytime conan.conf:

[log]
run_to_output = True        # environment CONAN_LOG_RUN_TO_OUTPUT  
run_to_file = False         # environment CONAN_LOG_RUN_TO_FILE  
level = critical            # environment CONAN_LOGGING_LEVEL  
print_run_commands = True
[general]
default_profile = default
compression_level = 9                 # environment CONAN_COMPRESSION_LEVEL
sysrequires_sudo = True               # environment CONAN_SYSREQUIRES_SUDO
request_timeout = 60                  # environment CONAN_REQUEST_TIMEOUT (seconds)
default_package_id_mode = full_package_mode
revisions_enabled = 1
full_transitive_package_id = 1
parallel_download = 64

[storage]
path = ./data

[proxies]
no_proxy_match = *...*
http = ...
https = ....

[hooks]
attribute_checker

Steps to reproduce (Include if Applicable)

Have a package with quite a few dependencies and do conan create . and upload * to force loads of uploads and downloads.

Exact procedure we are doing.

Before each build we are downloading are bundled conan package and and initializing this from scratch, and then just build particular package. We have provided custom cacert.pem

I am getting various set of errors:

Logs (Executed commands with output) (Include/Attach if Applicable)

Error no1 ConanException - MAIL

error   13-Mar-2020 17:08:20    bzip2/1.0.8: ERROR: Exception while getting package: f610127d9b93ffcf379adc4fddaeeacc2308a891
error   13-Mar-2020 17:08:20    bzip2/1.0.8: ERROR: Exception: <class 'conans.errors.ConanException'> 'MAIL'. [Remote: ig]
error   13-Mar-2020 17:08:20    bzip2/1.0.8: WARN: Trying to remove package folder: /var/build/bamboo/agent-home/xml-data/build-dir/CIR-CORRBASTATS15-JOB1/.conan-standalone/bamboo/.conan/data/bzip2/1.0.8/_/_/package/f610127d9b93ffcf379adc4fddaeeacc2308a891
error   13-Mar-2020 17:08:20    ERROR: 'MAIL'. [Remote: ig]

Error no2 ceritificate spurious failure

error   13-Mar-2020 17:07:53    Unable to connect to ig=MY_URL
error   13-Mar-2020 17:07:53    1. Make sure the remote is reachable or,
error   13-Mar-2020 17:07:53    2. Disable it by using conan remote disable,
error   13-Mar-2020 17:07:53    Then try again.
error   13-Mar-2020 17:07:53    ace+tao/2.2a.p16: WARN: Trying to remove package folder: /var/build/bamboo/agent-home/xml-data/build-dir/CIR-CORRBASTATS15-CONRH7/.conan-standalone/bamboo/.conan/data/ace+tao/2.2a.p16/_/_/package/ba316f0705e1724202d82fe124ba3cec855505c8
error   13-Mar-2020 17:07:53    ERROR: HTTPSConnectionPool(host='artifacts.iggroup.local', port=443): Max retries exceeded with url: /artifactory/api/conan/ig-conan/v2/conans/ace+tao/2.2a.p16/_/_/revisions/13d05791c5df8ba20f662f9072b393a4/packages/ba316f0705e1724202d82fe124ba3cec855505c8/revisions/d1348aed34e4e8ae7e9e2e898c3b9bb5/files (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))

Error no3 random bamboo variable?

error   13-Mar-2020 17:05:20    corba-stats/3.1: ERROR: Exception while getting package: bdef1eb1822b9e16f495a601d771a1a55cbe2c5e
error   13-Mar-2020 17:05:20    corba-stats/3.1: ERROR: Exception: <class 'conans.errors.ConanException'> 'bamboo_capability_os_arch'. [Remote: ig]
error   13-Mar-2020 17:05:20    corba-stats/3.1: WARN: Trying to remove package folder: /var/build/bamboo/agent-home/xml-data/build-dir/CIR-CORRBASTATS15-JOB1/.conan-standalone/bamboo/.conan/data/corba-stats/3.1/_/_/package/bdef1eb1822b9e16f495a601d771a1a55cbe2c5e
error   13-Mar-2020 17:05:20    ERROR: Error downloading file https://artifacts.iggroup.local/artifactory/api/conan/ig-conan/v2/conans/igcounters/1.3/_/_/revisions/67de2629f0a32c45ddd1ce5a9deb945f/packages/844baf09e70b1967aa243d91a3094bb6cdf5ce54/revisions/a00941a89c386d5c766881c425f73d0d/files/conanmanifest.txt: ''bamboo_capability_os_arch''
error   13-Mar-2020 17:05:20    ERROR: 'bamboo_capability_os_arch'. [Remote: ig]

Error no4 some npm error?

ERROR: stats/3.1:f1c30f8929c64350fcc4683162cfbebc5bdf34c0: Upload package to 'ig' failed: 'NVM_NODEJS_ORG_MIRROR'. [Remote: ig]

I have run around 40 builds without parallel settings and this works then without issue.

references: https://github.com/conan-io/conan/pull/6632 https://github.com/conan-io/conan/pull/5856

memsharded commented 4 years ago

Hi @fulara

The parallel download and the parallel upload are two very different features, implement in different places with different code. Please use separate issue for both.

Let's use this issue for the download one, if you see errors in the upload, please open a new one. We need to start narrowing the causes. First thing is that build --missing should be totally irrelevant, because the downloads are done before the builds. Is this issue reproducible with just some "hello world" simple packages? I think it would be good to try to isolate from the environment. Can you reproduce somehow also without running in the bamboo CI?

fulara commented 4 years ago

Hello @memsharded okay. Yea - I am seeing issues with both dl and upload: see my no4 issue: ERROR: stats/3.1:f1c30f8929c64350fcc4683162cfbebc5bdf34c0: Upload package to 'ig' failed: 'NVM_NODEJS_ORG_MIRROR'. [Remote: ig]

After weekend I'll be able to toy around with this, yes, ofc build --missing is irrelevant here. I'll get back to this as soon as i can.

I'll keep this for download only.

fulara commented 4 years ago

So i've done some tests. Somehow i was initially lucky with reproducing it but then a lot of no lucks. Anyway this is the error I was getting today, invoking build manually ad not out of bamboo: Got it three times out of 40 builds. Originally when I created this issue i was almost getting 1/1

ERROR: Download failed, check server, possibly try again
attempt to release recursive lock not owned by thread

I got that again on my complex package with code and deps etc.

Then i went ahead and generated a tree of dummy conans so something like generate 100 levels of conanfile that each next depend on previous one, I didnt get that 'attempt to release recursive lock' error even once during ~100 downloads. I was sometimes getting the CERTIFICATE error from above..
I was testing with packages weighing 10mb and 100kb,

Is there any logging you'd want enabled for this certificate or 'attempt to release' lock thing? When I get some more time i'll try to change this to instead work on some kind of buildable projects but not sure if that woul dmake any difference.

memsharded commented 4 years ago

It seems that such assertion comes from the tqdm library, and our progress bars that are not properly managed: https://github.com/iterative/dvc/issues/2589#issuecomment-543823689 Also it could be related to the installer, if it is pyinstaller or not.

It would be useful to know:

fulara commented 4 years ago

Updated original comment but same details here as well: also added conan.conf we are using.

OS: built rh6 ran on rh6 and rh7. installed: pyinstaller.

I am worried that certificate thing is a different thing though.