MSO4SC / cloudify-hpc-plugin

Plugin to allow Cloudify to deploy and orchestrate HPC resources
Apache License 2.0
8 stars 8 forks source link

Download big images crash #65

Closed emepetres closed 6 years ago

emepetres commented 6 years ago

@gdolle commented on Thu Jun 07 2018

@emepetres @victorsndvg I put here the problem with sregistry download. Currently, the bootstrap step randomly stops download for big images > 2go.

Sometimes it works, but most of the time the download is incomplete. I "think" it's not related to ssh timeout.

So I guess it's related to the orchestrator, something is happening during the bootstrap that kill the command.

Any idea ?

I put new apps (very simple) on the marketplace so you can test.


@gdolle commented on Fri Jun 08 2018

Most of the time it hangs around 33% ~ 40% of the download (~2go)


@gdolle commented on Fri Jun 08 2018

@emepetres I think it might still be related to a timeout somewhere in the orchestrator during the bootstrap step or a signal that kill the command process. If it's caused ssh, maybe it comes from the orchestrator user ssh config ? In my case, if I rerun the bootstrap via ssh, I have no problems.

emepetres commented 6 years ago

Possibly due to https://github.com/singularityhub/sregistry-cli/issues/124

gdolle commented 6 years ago

@emepetres I don't think so. Look at the issue, @ncde is not using singularityhub, but the cesga sregistry.

Trophime commented 6 years ago

regarding the trouble with singularity I've experienced this issue on my "canary" registry. I add to change some settings in nginx.conf: server { listen *:80; server_name localhost;

client_max_body_size 8000M; client_body_buffer_size 8000M; client_body_timeout 120;

... as far as I remember the trouble was "removed" with http but not https But I think @victorsndvg can comment on this

victorsndvg commented 6 years ago

yes @Trophime, this settings and some more were added to the cesga deployment of sregistry. Let us see if it works for the course before answering the final configuration

emepetres commented 6 years ago

As a workaround for ECMI, I propose to manually download the images in advance on a common path in the infrastructures you plan to use (e.g. FTII or ATLAS).

Then the bootstrap script would check for the image in the HPC, and only download if it is not present. For example, in the singularity examples, I copy the image from FTII local repository to the user space, without downloading it from internet.

This is not a permanent solution of course, but it could help pass the ECMI with a good user experience.

victorsndvg commented 6 years ago

I think this was already fixed. I was able to download feelpp and hifimagnet images.

Please, @Trophime and @prudhomm , can you confirm in order to close the issue or, the opposite, continue in the loop?

Thanks

victorsndvg commented 6 years ago

@ncde , I also need to know your experience