conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
7.95k stars 951 forks source link

[bug] Conan create fails in test package step when run against a lockfile #16534

Open jasal82 opened 6 days ago

jasal82 commented 6 days ago

Describe the bug

When running conan create against a previously created lockfile, the test package step sometimes fails with the error message

ERROR: Something failed while testing 'X' test_package after it was built using the lockfile. Please report this error: 'Y' package-id '...' doesn't match the locked one '...'

Environment is Conan 1.64.1.

How to reproduce it

Context

So far, we have only observed this problem in consumer projects with one specific transitive dependency, and the error always identifies a change in the package ID of that dependency as the cause. It is therefore reasonable to assume that a special feature in the dependency subgraph of the component acts as a catalyst.

Analysis

The lockfile itself is not conspicuous. The component is listed correctly and also has the correct package ID, which is stable via the usual build paths (conan info, conan install, conan create consumer build). Only in the test package build path does a different ID appear to be calculated at this point (and only for this specific component).

Disabling the test step in conan create by adding a parameter -tf None and then executing the test manually via conan test and the lockfile usually fixes the problem, but we've also seen some builds where this would fail with another error that might indicate an inconsistency in the lockfile. However, it looks like this an unrelated and thus separate issue.

Since the Conan output, the log files and the metadata did not provide any more information, the next step was to debug the package ID calculation in Conan itself. The first stop was the method _compute_package_id() in conans/client/graph/graph_binaries.py which assigns the ID to each node in the graph. However, calculation itself is done in conans/model/info.py and specifically in the method ConanInfo.package_id(). We can see that the final ID is created by combining several SHA hashes for things like settings, options, python_requires, etc. Some prints in this method showed that only the options SHA differs between the normal build path and the test package build path.

Digging deeper into the options SHA generation in conans/model/options.py tells us that the SHA is again created by combining the SHA of the node's own options and the SHAs of the options of the requirements. These SHAs never change between the build paths (which is good). However, it seems that in the test package build the component adds the options hash of a transitive component to its own options hash, although that component does not define any options and is not a direct dependency. Instead, the package is indirectly required via an intermediate direct dependency.

The question is, why is this component visible as a transitive requirement in the test package build, but neither in the lockfile generation nor the actual consumer build path?

Debugging

After some more debugging in the graph manager I finally managed to find the real root cause, which is quite intricate. The problem starts in https://github.com/conan-io/conan/blob/release/1.64/conans/client/graph/graph_binaries.py#L441 where the locked options (from the lockfile) are compared to the computed options (by the test package build). This call to the != operator invokes the ne implementation in https://github.com/conan-io/conan/blob/53923eff46906664efb0347098ecabf33bba0a90/conans/model/options.py#L256 which uses the eq implementation https://github.com/conan-io/conan/blob/53923eff46906664efb0347098ecabf33bba0a90/conans/model/options.py#L258. There it iterates over the computed options and fetches the matching entry from the locked options. However, because of the implementation of getitem in https://github.com/conan-io/conan/blob/53923eff46906664efb0347098ecabf33bba0a90/conans/model/options.py#L240 this lookup will actually insert a new default item with that key in the locked options object. The seemlingly harmless comparison between the two option objects thus takes on a side effect and modifies one of the objects. Because the calculated options are passed downstream through the graph, the error is propagated.