apache / buildstream

BuildStream, the software integration tool
https://buildstream.build/
Apache License 2.0
85 stars 28 forks source link

Artifact related bugs reported as user facing errors #1016

Open BuildStream-Migration-Bot opened 3 years ago

BuildStream-Migration-Bot commented 3 years ago

See original issue on GitLab In GitLab by [Gitlab user @tristanvb] on May 6, 2019, 06:03

While building, I am getting errors as such reported:

[00:00:00][????????][build:sdk/glib.bst                  ] FAILURE Attempt to access unavailable artifact: [Errno 2] No such file or directory: '/home/tristan/.cache/buildstream/artifacts/cas/refs/heads/gnome/sdk-glib/d371839c675f4be8c70f9f212093531699e77baf6500e8e56d8dbd7270607068'

A missing artifact is actually a bug, not an error; the user cannot fix this, and it is not a system error, it is clearly our fault if this error is ever seen.

Currently we have a contorted story around these errors, and we mitigate this by calling Element.__assert_cached() in some places, but not all places.

I think this is backwards and unsafe, we should reverse this such that the underlying CAS errors do not inherit from BstError, and have ArtifactCache only handle the recoverable errors and turn them into a BstError deriving ArtifactError.

In bst-1.2

When staging a built workspace with a missing artifact (after replacing a raise ArtifactError with a raise AssertionError), we get the following codepath which leads to a bug misreported as an error:

      An unhandled exception occured:

        Traceback (most recent call last):
          File "/codethink/GNOME/buildstream/buildstream/_artifactcache/cascache.py", line 462, in resolve_ref
            with open(refpath, 'rb') as f:
        FileNotFoundError: [Errno 2] No such file or directory: '/home/tristan/.cache/buildstream/artifacts/cas/refs/heads/gnome/core-gdm/72d9b4e1bcfe9742ea52c7d5f76b62886627a90534a6ffc908b609badc650f3c'

        The above exception was the direct cause of the following exception:

        Traceback (most recent call last):
          File "/codethink/GNOME/buildstream/buildstream/_artifactcache/cascache.py", line 106, in extract
            tree = self.resolve_ref(ref, update_mtime=True)
          File "/codethink/GNOME/buildstream/buildstream/_artifactcache/cascache.py", line 471, in resolve_ref
            raise ArtifactError("Attempt to access unavailable artifact: {}".format(e)) from e
        buildstream._exceptions.ArtifactError: Attempt to access unavailable artifact: [Errno 2] No such file or directory: '/home/tristan/.cache/buildstream/artifacts/cas/refs/heads/gnome/core-gdm/72d9b4e1bcfe9742ea52c7d5f76b62886627a90534a6ffc908b609badc650f3c'

        The above exception was the direct cause of the following exception:

        Traceback (most recent call last):
          File "/codethink/GNOME/buildstream/buildstream/_scheduler/jobs/job.py", line 413, in _child_action
            result = self.child_process()
          File "/codethink/GNOME/buildstream/buildstream/_scheduler/jobs/elementjob.py", line 94, in child_process
            return self._action_cb(self._element)
          File "/codethink/GNOME/buildstream/buildstream/_scheduler/queues/buildqueue.py", line 35, in process
            return element._assemble()
          File "/codethink/GNOME/buildstream/buildstream/element.py", line 1523, in _assemble
            self.stage(sandbox)
          File "/codethink/GNOME/buildstream/buildstream/buildelement.py", line 165, in stage
            self.stage_dependency_artifacts(sandbox, Scope.BUILD)
          File "/codethink/GNOME/buildstream/buildstream/element.py", line 704, in stage_dependency_artifacts
            old_dep_keys = self.__get_artifact_metadata_dependencies(workspace.last_successful)
          File "/codethink/GNOME/buildstream/buildstream/element.py", line 2442, in __get_artifact_metadata_dependencies
            artifact_base, key = self.__extract(key)
          File "/codethink/GNOME/buildstream/buildstream/element.py", line 2394, in __extract
            return (self.__artifacts.extract(self, key), key)
          File "/codethink/GNOME/buildstream/buildstream/_artifactcache/cascache.py", line 108, in extract
            raise AssertionError(str(e)) from e
        AssertionError: Attempt to access unavailable artifact: [Errno 2] No such file or directory: '/home/tristan/.cache/buildstream/artifacts/cas/refs/heads/gnome/core-gdm/72d9b4e1bcfe9742ea52c7d5f76b62886627a90534a6ffc908b609badc650f3c'

In master

Here I have not checked the stack trace yet, but I have verified that the code still runs in the same fashion: We raise a CASCacheError which derives from BstError, and we sprinkle self.__assert_cached() statements around element.py in the hopes of covering any case where we're about to access an artifact which doesnt exist, instead of having the underlying CASCacheError be treated as a simple exception, and reporting the recoverable errors as ArtifactErrors.

In this codepath, we would hit the following:

Instead, we should have a CASCacheError which is not a BstError, and we should simply not except CASCacheError in that case, ensuring that the missing artifact is reported as a BUG.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Nov 5, 2020, 07:35

This is worth a quick review of the artifact cache code, should be fairly easy either check that none of this is happening anymore or to simply fix these, I suspect they are mostly fixed by now.