abhilekhsingh / gc3pie

Automatically exported from code.google.com/p/gc3pie
0 stars 0 forks source link

inability to retrieve output from VMs for the cloud backends #474

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The current output retrieving logic doesn't work as expected under certain 
circumstances and causes a task to stay in the TERMINATING state forever.

If we submit multiple jobs to one VM and these jobs are equal in terms of 
processing time (basically it means that they will change states during the 
same iteration of the main progress loop) then, when the jobs get to the 
TERMINATING state, the gc3pie retrieves output of the first job and simply 
terminates the associated VM before retrieving output of the other jobs. 

The problem is in the gc3pie core (core.py) and when it comes to the freeing 
logic the list resource.job_infos is empty already since all the jobs changed 
state to the TERMINATING together at the same iteration of the main progress 
loop (the list shows only the RUNNING jobs).

I am using the development branch, revision 4123.

I made a temporary fix (attached file coreFetchOutputBug.patch), there, I 
simply divided "retrieving" logic to two separate loops, first retrieves all 
outputs, second frees the jobs (terminates VM).

Original issue reported on code.google.com by Karanda...@gmail.com on 2 Feb 2015 at 1:51

Attachments:

GoogleCodeExporter commented 9 years ago
I would rather fix the logic in the backends `free()` method: VMs should not 
been deleted until *all* jobs running there have transitioned to TERMINATED 
state.  

What cloud/VM backend are you using? EC2 or OpenStack?

Original comment by riccardo.murri@gmail.com on 9 Feb 2015 at 10:53

GoogleCodeExporter commented 9 years ago
I absolutely agree with you, however, fixing the 'free()' method would involve 
more significant changes in the code (and probably the logic). I think these 
changes should rather make somebody more experienced than me.
I only provided a very temporal fix in case somebody else also experiences the 
bug before appropriate patch is provided.

I had this problem with the OpenStack backend, but after a look at the EC2 
backend, most probably it has the same problem.

Original comment by Karanda...@gmail.com on 20 Feb 2015 at 9:42