conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.24k stars 980 forks source link

[bug] Conan remove --locks doesn't work #7300

Closed Cazadorro closed 7 months ago

Cazadorro commented 4 years ago

Environment Details (include every applicable attribute)

Steps to reproduce (Include if Applicable)

Tried to install ZMQ https://conan.io/center/zmq/4.3.2/ entered in password wrong (also plain text visibility on password...?? what...?) conan apparently couldn't handle it, and after an hour, nothing, ctrl-c, quit, nothing worked. I manually closed the gitbash window and tried again, was met with

$ conan install .. -s compiler=gcc
Configuration:
[settings]
arch=x86_64
arch_build=x86_64
build_type=Release
compiler=gcc
os=Windows
os_build=Windows
[options]
[build_requires]
[env]

zmq/4.3.2@bincrafters/stable is locked by another concurrent conan process, wait...
If not the case, quit, and do 'conan remove --locks'
You pressed Ctrl+C!
ERROR: Exiting with code: 3

so I run

$ conan remove --locks
Cache locks removed

and try again. This time I get

$ conan install .. -s compiler=gcc
Configuration:
[settings]
arch=x86_64
arch_build=x86_64
build_type=Release
compiler=gcc
os=Windows
os_build=Windows
[options]
[build_requires]
[env]

zmq/4.3.2@bincrafters/stable is locked by another concurrent conan process, wait...
If not the case, quit, and do 'conan remove --locks'
You pressed Ctrl+C!
ERROR: Exiting with code: 3

the same exact thing.

I check my .conan folder, low and behold, I see stable.count.lock and stable.count both still there.

stable.count contains a -1, I don't know if that means anything. stable.count.lock is empty.

So apparently removing the lock didn't remove the lock.

So after manually removing the lock, I was able to restart the installation. Unfortunately the install still didn't work, even after inputting the correct credentials. Conan still sits there and doesn't give any feedback or indication that anything is happening. I suspect this potentially has something to do with proxy permissions or something, but with out any feedback, it is impossible to tell.

memsharded commented 4 years ago

Hi @Cazadorro

Please let me first suggest a few things:

Please try the following:

$ conan user --clean  # Not really necessary, but just in case
$ rm -rf <userhome>/.conan/data  # This will wipe your local cache packages
# if not a terminal with rm -rf just make sure that that folder is removed
$ conan install zeromq/4.3.2@

I have tried this and seems to work. Could you please try again and let me know? Thanks!

Cazadorro commented 4 years ago

@memsharded When I was on my companies VPN, it asked me for a password, and forced me to create an account. When I disconnected from the VPN, it didn't even prompt me. I don't know what bearing proxies and VPN's have on conan doing that, but I didn't get any indication of why this difference would exist.

downloading zmq apparently works as well.

$ conan install zeromq/4.3.2@
Configuration:
[settings]
arch=x86_64
arch_build=x86_64
build_type=Release
compiler=Visual Studio
compiler.runtime=MD
compiler.version=15
os=Windows
os_build=Windows
[options]
[build_requires]
[env]

zeromq/4.3.2: Not found in local cache, looking in remotes...
zeromq/4.3.2: Trying with 'conan-center'...
Downloading conanmanifest.txt
Downloading conanfile.py
Downloading conan_export.tgz
zeromq/4.3.2: Downloaded recipe revision 0
libsodium/1.0.18: Not found in local cache, looking in remotes...
libsodium/1.0.18: Trying with 'conan-center'...
Downloading conanmanifest.txt
Downloading conanfile.py
Downloading conan_export.tgz
libsodium/1.0.18: Downloaded recipe revision 0
Installing package: zeromq/4.3.2
Requirements
    libsodium/1.0.18 from 'conan-center' - Downloaded
    zeromq/4.3.2 from 'conan-center' - Downloaded
Packages
    libsodium/1.0.18:4dbd0a60aaf2d6d9a1a630239ee8efd3c98614b7 - Download
    zeromq/4.3.2:92105e8fc16d129eb40755d7d2152c4855f68c46 - Download

Installing (downloading, building) binaries...
libsodium/1.0.18: Retrieving package 4dbd0a60aaf2d6d9a1a630239ee8efd3c98614b7 from remote 'conan-center'
Downloading conanmanifest.txt
Downloading conaninfo.txt
Downloading conan_package.tgz
libsodium/1.0.18: Package installed 4dbd0a60aaf2d6d9a1a630239ee8efd3c98614b7
libsodium/1.0.18: Downloaded package revision 0
zeromq/4.3.2: Retrieving package 92105e8fc16d129eb40755d7d2152c4855f68c46 from remote 'conan-center'
Downloading conanmanifest.txt
Downloading conaninfo.txt
Downloading conan_package.tgz
zeromq/4.3.2: Package installed 92105e8fc16d129eb40755d7d2152c4855f68c46
zeromq/4.3.2: Downloaded package revision 0

Not sure how to properly search for libraries in conan though, the one I found was the first one found in google, keeps bringing me to that jfrog bintray place.

Also realized I needed boost, downloading was all fine and good until I realized the regular boost targets weren't available... I'm using cmake_find_package. I guess Conan creates it's own FindPackage, but CMake has some hardcoded stuff for certain libraries like boost,, who are taking a long time to switch over to CMake. Boost::Boost works for me, but for other people this probably isn't acceptable.

Here is what happens when I try to install when connected to my work's VPN, tried catch2 to make sure I didn't already have it.

image

$ conan install catch2/2.12.3@
Configuration:
[settings]
arch=x86_64
arch_build=x86_64
build_type=Release
compiler=Visual Studio
compiler.runtime=MD
compiler.version=15
os=Windows
os_build=Windows
[options]
[build_requires]
[env]

catch2/2.12.3: Not found in local cache, looking in remotes...
catch2/2.12.3: Trying with 'conan-center'...
Downloading conanmanifest.txt
Please log in to "conan-center" to perform this action. Execute "conan user" command.
If you don't have an account sign up here: https://bintray.com/signup/oss
Remote 'conan-center' username: myuser
Please enter a password for "myuser" account: examplepass
memsharded commented 4 years ago

Ok, let me explain a couple of those issues:

So to search packages, it is recommended:

Said that, the really surprising thing that is worth investigating is why under your VPN you are asked for a password, which shouldn't be necessary, ever. We would need to know more about your VPN configuration, can you ask your administrators for traces of those requests? They would contain "bintray" or "conan.bintray.com". I could guess that the VPN is converting somehow a response into a 401 or 403, but I can not know the reasons.

memsharded commented 4 years ago

Hi @Cazadorro

I have checked with the Bintray team, not aware of other similar cases, but they have checked the logs, nothing weird there that they could find. Did you get a bit more info from your administrators about your VPN?

Cazadorro commented 4 years ago

@memsharded sorry, my IT department is swamped and likely won't be able to respond to this kind of request until corona is gone, no matter how simple.

timblechmann commented 4 years ago

i've seen something similar on a windows laptop in a project where conan is called as subprocess from cmake.

If not the case, quit, and do 'conan remove --locks'

after killing conan (and cmake) the error persisted, and it makes me wonder if the implementation is using file lock that will automatically be released when the process terminates? i've not seen this on macos or linux, so maybe it's something win32 specific?

there's one other practical issue: when running conan as subprocess, i cannot easily quit it. from a high-level i wonder if we can customise the locking behaviour to either wait or fail (or maybe wait for a customisable time)

memsharded commented 4 years ago

Hi @timblechmann

We have tried hard to have this released automatically, but OS-portable locking is quite challenging. Cannot be done with the current tooling (fasteners), and implementing it low-level has a relatively high risk of multi-platform portability issues, instability, etc, so even if there is some attempt to do it, we didn't move forward so far. Yes, Win (as usual) is the "different" one, and causing more issues here.

Regarding timing-out, one of the issues we found is that many C++ libraries take a long time to build. You cannot lock for only 5 minutes for example, because some builds will take way more time than this.

So we have plans to investigate this deeper for Conan 2.0, but honestly this is a very challenging thing, I don't know how far we will be able to get. In the core it means that we should implement and maintain a multiplatform semaphore that works robustly and efficiently across systems including FreeBSD, Solaris, different Linuxes and Windows (Conan users still doing very old Windows)... and that is not a small task.

timblechmann commented 4 years ago

hmm, i had a quick look:

fasterner uses msvcrt.locking, which seems to wrap _locking, as in: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/locking?view=vs-2019

it documented as:

Regions should be locked only briefly and should be unlocked before closing a file or exiting the program.

however there's LockFile: https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-lockfile

If a process terminates with a portion of a file locked or closes a file that has outstanding locks, the locks are unlocked by the operating system. However, the time it takes for the operating system to unlock these locks depends upon available system resources. Therefore, it is recommended that your process explicitly unlock all files it has locked when it terminates. If this is not done, access to these files may be denied if the operating system has not yet unlocked them.

🤔


regarding timeouts: they are only required for parallel conan invocations, right? i totally see your point, but it would be perfectly fine for my use case to time out so that i can inform the user to try again later

timblechmann commented 4 years ago

hmm, i've just heard from a coworker who that conan remove --locks did not work for him across reboots.

blackliner commented 3 years ago

ok, getting the infamous message (of course there is no other process accessing the package):

ceres-solver/1.13.0 is locked by another concurrent conan process, wait...
If not the case, quit, and do 'conan remove --locks'

now, running the command conan remove --locks does unfortunately not help, the 2 files in the ceres-solver dir just stay there: tree .conan/data/ceres-solver/

.conan/data/ceres-solver/
└── 1.13.0
    └── _
        ├── _.count << content is: -1
        └── _.count.lock << no content
2 directories, 2 files

Seems like this scenario could be formulated as a unit test?

ttencate commented 3 years ago

I had a similar problem and looked into this a bit more. It seems locks are only cleaned if there is a sibling directory with the same name prefix. In @blackliner's case, there was no directory .conan/data/ceres-solver/1.13.0/_/_, so list_folder_subdirs doesn't return it, and its sibling lock files aren't cleaned up:

https://github.com/conan-io/conan/blob/687eb9fd97aace7e3159de46071401d8209c68e2/conans/client/cache/cache.py#L265-L269

There is a way in which this situation can arise: issuing conan install some/version@my/channel and being asked for a password. If you abort with Ctrl+C at that point, some/version/my/channel.count{,.lock} are created, but some/version/my/channel/ is not.

@memsharded Is it safe to change the lock cleanup code to go only 3 levels deep, and indiscriminately remove *.count and *.count.lock in all of those directories? If so, I can send a pull request.

xahon commented 3 years ago

I have this problem too

ilyamt-tandemg commented 3 years ago

I discovered that if a process locks a file or a directory belonging to the package, Conan will not check that deleting failed, but will tell you that everything is fine. This may lead to the inability to use remove or remove --locks to fix problems. To replicate the problem, lock a package directory in the Conan cache and try to remove it using Conan CLI. The problem was replicated both in Windows and Linux.

edit: new information

Vhab commented 2 years ago

@memsharded This issue still affects us regularly.

We have regular reports across our organization of this exact bug triggering, to the point where we have a standard copy-paste response with instructions on how to manually remove locks.

Would it be possible to raise the priority on investigating and fixing this issue?

earonesty commented 2 years ago

pids should be written into lockfiles and consulted to see if the process is still running. if a process is killed (user clicks the stop button in ci/cd), lock files might be left around. since pids are not guaranteed unique, this is only a mitigation. but in practice, it usually works very well.

memsharded commented 2 years ago

pids should be written into lockfiles and consulted to see if the process is still running. if a process is killed (user clicks the stop button in ci/cd), lock files might be left around. since pids are not guaranteed unique, this is only a mitigation. but in practice, it usually works very well.

@earonesty thanks for the suggestion, we already tried that some time ago, and it happens that Windows reuses pids and even if unusual, it already hit some time, so it was dropped. Conan 2.0 cache, we have removed the locked completely at the moment, and we need to work on concurrency from scratch, we will reconsider the alternatives and see what can be done. It seems that fasteners also improved its support for the readers-writers problem, and that could avoid some of the 1.X limitations, but this needs to be investigated.

earonesty commented 2 years ago

sqlite3 has great low-level concurrency support, and comes with python. i often use it for that. you can do cross-process queues, mutexes, etc. once you have a nice db. but we wound up writing our own lock class anyway because it's more correct on windows. i can post if you want

earonesty commented 2 years ago

this is the ez sqlite one, fasteners looks like a good one tho, i agree

does conan really need more than an "exclusive lock"? (sqlite supports RW locks in WAL mode). reads are so fast (seconds) compared to writes which can take minutes, seems the complexity of RW locks isn't worth it

import sqlite3
from threading import RLock
from contextlib import contextmanager

lock_timeout = 999999
g_lock = RLock()
@contextmanager
def lock(path):
    with g_lock:
          # The different processes must point to the same database file.
          db = sqlite3.connect(path, isolation_level="IMMEDIATE")
          # Keep waiting if blocked.
          # See: https://sqlite.org/c3ref/busy_timeout.html
          db.execute(f"PRAGMA busy_timeout = {lock_timeout}")
          with db:
              db.execute("CREATE TABLE IF NOT EXISTS lock(a INT PRIMARY KEY)")
              db.execute("DELETE FROM lock")
              db.execute("INSERT INTO lock VALUES (1)")
              # Yield from inside the transaction to hold a lock on the table.
              yield
memsharded commented 2 years ago

This is good feedback, we started to try to use sqlite for multi-process sync, but didn't go deeper. We might want to investigated and try this idea further.

does conan really need more than an "exclusive lock"? (sqlite supports RW locks in WAL mode). reads are so fast (seconds) compared to writes which can take minutes, seems the complexity of RW locks isn't worth it

Yes, reads can take as long as the consumer of a package is building. So when you are building one package from source you must lock all the transitive dependencies of that package (reader). This should allow other packages building in parallel to read the same transitive dependencies (multiple readers). And when one writer takes control over one package (while building from source), it should block all its consumers from reading completely. And the problem is that a build from source operation can take many minutes, a build of 10-20 minutes is not unusual. Implementing a simple mutex would easily take down concurrency of the cache to pure sequential access in practice. So we believe that a good readers-writer implementation is necessary for some reasonable concurrency. If it can be implemented with python sqlite module robustly, that would be fantastic.

earonesty commented 2 years ago

Hey i think i found a bug too. If lock count ever reaches "-1" (Can happen due to race conds while incorrectly calling conan remove locks), then --remove locks ceases to work and all conan's get locked forever. Manually removing the ".count" file seems to fix.

xahon commented 2 years ago

@earonesty what is .count file? Where is it located?

UPD found in <storage>/<package>/<version>/_/_/_.count contains -1, removing doesn't help I also tried to remove all package caches and update conan to 1.50.0

pBogey commented 1 year ago

So... any chance on this issue being fixed for conan 1.x?

memsharded commented 1 year ago

So... any chance on this issue being fixed for conan 1.x?

I am afraid this is unlikely. This is very challenging, and we didn't manage to address it, even before Conan 2.0 was out.

At the moment Conan 2.0 cache has no concurrency at all, it has to be strictly sequential or use different caches for parallel commands. It is planned to try to introduce locks for concurrency, and this would be the priority, but I am afraid that trying to fix it for Conan 1.X seems quite impossible.

memsharded commented 7 months ago

Closing in favor of https://github.com/conan-io/conan/issues/15840 where the Conan 2 cache concurrency future progress will be reported.

smaudet commented 7 months ago

Still an issue for conan 1.x ... Maybe time for a helper script to be written.

For me the (non-scripted) work-around was to delete the folders out of .conan (I have the password bug)

smaudet commented 7 months ago

@memsharded @ttencate

I think the fix would (should have been?) be to prompt for edge cases during lock removal, possibly as another option. I'm manually running this while trying to build something (issues with a custom server instance I have zero control over).

Abandoning this for 2.x doesn't seem like a wise idea while 1.x instances are in active usage out in the wild...

memsharded commented 7 months ago

Hi @smaudet

Thanks for your feedback.

Conan 2 was released now more than 1 year ago, and it is nowadays the mainstream version. While we acknowledge this might be an issue sometimes, it is not that frequent, and it has some workarounds like manually removing the folders, so it is just a matter of priorities, it is impossible to do everything, so there are many other much higher priorities, this is the reason this is closed and not planned to be fixed, sorry for the inconvenience!