Closed asnaylor closed 1 year ago
Could be the ongoing lustre issues?
Hi I don't think it's because of a specific login node.
Today I pull hrzhao076/custom_backend:2.1
after I push it on login09
it fails again.
Switching to another login node and pulling again works. Maybe it's simply because that one cannot push and pull on a same node?
Copying blob d1c5a39be588 done
Copying blob b2eb8f42dffa done
Copying blob 1a2a288b4b59 skipped: already exists
Copying config bddaf9cb14 done
Writing manifest to image destination
Storing signatures
bddaf9cb1464e14dc3b35f22e6e2e75ad4eec59ed98f16aef599ef7d4c0f41e6
INFO: Migrating image to /pscratch/sd/h/hrzhao/storage
Traceback (most recent call last):
File "/usr/bin/podman-hpc", line 11, in <module>
load_entry_point('podman-hpc==1.0.2', 'console_scripts', 'podman-hpc')()
File "/usr/lib/python3.6/site-packages/podman_hpc/podman_hpc.py", line 388, in main
podhpc(prog_name="podman-hpc")
File "/usr/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/decorators.py", line 64, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/lib/python3.6/site-packages/podman_hpc/podman_hpc.py", line 177, in pull
mu.migrate_image(image)
File "/usr/lib/python3.6/site-packages/podman_hpc/migrate2scratch.py", line 435, in migrate_image
rld = self._get_img_layers(self.src, img_id)
File "/usr/lib/python3.6/site-packages/podman_hpc/migrate2scratch.py", line 321, in _get_img_layers
ld = by_digest[layer["digest"]]
KeyError: 'sha256:1a2a288b4b593904fe90ec4335d78ae7b1026a979ff43a02aa88374a63dae5dc'
Is this still happening @asnaylor and @hrzhao76?
Hmm I have a different error now:
asnaylor@perlmutter:login02 | ~ $ podman-hpc pull hrzhao076/custom_backend:2.1
....
Copying blob 4713e6baa1cc done
Copying blob ecac3aeb0b12 done
Copying blob d1c5a39be588 done
Copying blob b2eb8f42dffa done
Copying blob 1a2a288b4b59 [==================================>---] 8.7GiB / 9.4GiB
Error: writing blob: storing blob to file "/tmp/storage2742201414/47": happened during read: (heuristic tuning data: last retry 9290086454, current offset 9290086454; 361049.516 ms total, 63015.394 ms since progress): unexpected EOF
Pull failed.
It was fine when i ran it again
Thanks for checking. I'll go ahead and close, but please re-open if you see it again.
Pulling
hrzhao076/custom_backend:2.0
onperlmutter:login06
failed with aKeyError
but was able to successfully pull onlogin07
.