NVIDIA / yum-packaging-precompiled-kmod

NVIDIA precompiled kernel module packaging for RHEL
Apache License 2.0
35 stars 16 forks source link

Unable to sync repository with pulp (RH satellite) #19

Closed kmittman closed 3 years ago

kmittman commented 3 years ago

Source: https://forums.developer.nvidia.com/t/error-syncing-rhel8-cuda-repo/176276

Multiple reports of unable to mirror/sync https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/ repository with pulp since ~ 1 month ago.

"traceback"=>
     "Traceback (most recent call last):\n" +
     "  File \"/usr/lib/python2.7/site-packages/celery/app/trace.py\", line 367, in trace_task\n" +
     "    R = retval = fun(*args, **kwargs)\n" +
     "  File \"/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py\", line 688, in __call__\n" +
     "    return super(Task, self).__call__(*args, **kwargs)\n" +
     "  File \"/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py\", line 110, in __call__\n" +
     "    return super(PulpTask, self).__call__(*args, **kwargs)\n" +
     "  File \"/usr/lib/python2.7/site-packages/celery/app/trace.py\", line 622, in __protected_call__\n" +
     "    return self.run(*args, **kwargs)\n" +
     "  File \"/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py\", line 860, in sync\n" +
     "    raise pulp_exceptions.PulpExecutionException(_('Importer indicated a failed response'))\n" +
     "PulpExecutionException: Importer indicated a failed response\n",

modules.yaml document is valid UTF-8 but error suggests parsing failing at stream key-values.

        "distribution"=>
         {"items_total"=>0,
          "state"=>"FINISHED",
          "error_details"=>[],
          "items_left"=>0},
        "modules"=>
         {"state"=>"FAILED",
          "error"=>
           "strings in documents must be valid UTF-8: '\\x8c\\x01\\x00\\x00\\x04465-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04418-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04450\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04440\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04460\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04455\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04465\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04440-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04455-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04460-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04latest-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04450-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04418\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04latest\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x00'"},
        "errata"=>{"state"=>"NOT_STARTED"},
        "metadata"=>{"state"=>"FINISHED"}}},

Looking at another modularity-enabled repository, strings are enclosed in double-quotes ...

dralley commented 3 years ago

edit: disregard. I know what the problem is, it's not a problem with the metadata pre-se, its an edge case triggered by a specific arrangement of it.

dralley commented 3 years ago

I can say confidently that this isn't an Nvidia repo issue.

zvonkok commented 3 years ago

I think this can be closed.

kmittman commented 3 years ago

Pinging @dralley to comment.

dralley commented 3 years ago

I agree - there is a fix deployed upstream that I believe is still trickling down into an actual release.

kmittman commented 3 years ago

Closing based on recommendation from Red Hat engineers.

stuartcampbell commented 3 years ago

I know this is closed, but just to comment that this is still an issue for the RHEL8/EL8 repos (works fine for the RHEL/EL7) - all other yum repos we sync also work fine, it's only this NVIDIA one that broke sometime in May.

I will open a support request with Red Hat to see what they say and comment here on any eta for the fix.

Thank you for looking into it anyway.

mulderij commented 3 years ago

I'll have to agree with @stuartcampbell We experience the same issue on the RHEL8-repo. No issues with RHEL7-repo or any other repo in our Satellite. BTW: we tried with Satellite 6.9.2 & 6.9.4

stdweird commented 3 years ago

@stuartcampbell @mulderij any luck so far getting a fix for this? @dralley what is fix are you talking about? is it some satellite hotfix that customers can get their hands on?

stdweird commented 3 years ago

ok, found patch in https://forums.developer.nvidia.com/t/error-syncing-rhel8-cuda-repo/176276/16 i confirm that it works.

mulderij commented 3 years ago

@stdweird

@stuartcampbell @mulderij any luck so far getting a fix for this?

We did apply the workaround and that worked. Today we tried again on Satellite 6.9.5 without the workaround and that also appears to work.

stdweird commented 3 years ago

@mulderij thanks. i read the releasenotes yestderday, nothing jumped out as relevant so i tried the patch instead. good to see it confirmed !

kmittman commented 3 years ago

Thank you for confirming that RH Satellite now has the fix.

dralley commented 3 years ago

Looks like the patch is targeted for Satellite 6.9.7.

Apologies it took so long, apparently because there was a needinfo flag, it didn't get included in the triage discussions for the previous release.