adaptivecomputing / torque

Torque Repository
Other
252 stars 141 forks source link

2.5.x: -DUSESAVEDRESOURCES #70

Open lflis opened 11 years ago

lflis commented 11 years ago

Dear All,

While trying to fix pbs_mom behaviour to preserve resource usage information when mom gets restarted with -p option i came across code between USESAVEDRESOURCES define blocks.

My question to you is why not make it a default build option? Was there any reason behind keeping it disabled by default?

Cheers

LKF

knielson commented 11 years ago

Lukasz

This is lost infomration. I am not sure what the USESAVEDRESOURCES was originally intended for other than something to do with job recovery from what I see in the code.

lflis commented 11 years ago

This likely prevents job/task accounting (consumed cputime, consumed resources like mem, vmem) information from being lost when pbs_server or pbs_mom is restarted.

More information can be obtained from ataufer who commited the code to SVN Date: Thu Jan 28 19:26:09 2010 +0000

If the change is nonintrusive (i belive it is not - but this requires confirmation) it should be included as the default.

knielson commented 11 years ago

I do not know what the USESAVEDRESOURCES original intent is. If Al added it we may have a hard time finding out since he is no longer working on TORQUE.

mattaezell commented 11 years ago

From the commit history, this was added back in 2010:

added -DUSESAVEDRESOURCES code that uses servers saved resources used for accounting end record instead of current resources used for jobs that stopped running while mom was not up

We use that define on Titan and Gaea. I think the intent is to avoid "resetting" the walltime used if the MOM restarts. I think we had situations where a node would reboot and when it came back up and sent the OBIT it would claim that the job accumulated no (or very little) walltime.

We'd like to eventually get rid of all of our CFLAGS, so I would support researching this and either making it default or a run-time config option.

lflis commented 11 years ago

Matt, which version of torque are you using with the -DUSESAVEDRESOURCES ?

mattaezell commented 11 years ago

We used it with 2.5.x, and now we are running 4.1.x with it also.

mmamon commented 11 years ago

I think that in 2.5.13 there is a regression bug: default compilation, i.e without -DUSESAVEDRESOURCES do not produce resources_used.walltime and resources_used.cput in the accounting logs.