Closed verolero86 closed 6 years ago
I've also observed job_type
be batch
when it should be interactive
. Note: this may be an LSF issue (LSF calling the API with the wrong values).
Two questions:
user_script
and job_name
originally supposed to be, as in was this even close.SELECT allocation_id,user_script,job_name
FROM csm_allocation_history
WHERE allocation_id=2272;
Active:
SELECT allocation_id,user_script,job_name
FROM csm_allocation
WHERE allocation_id=2272;
If the database has a match to user input then we're going to have to dig into the csm_allocation_query_details
, if the database matches the output the problem is then further up the chain with either csm_allocation_create
or LSF.
job_type: batch <=== job_type is filled in by CSM.
INCORRECT: job_name: stf006accept <=== this is the project name (bsub -P) filled in by LSF
INCORRECT: user_script: 1525381589.19121 <=== for bsub redirect job, it's the job file name for regular job submitted not by bsub < script, it's the real executable This is to workaround an issue in CSM to store long strings with special chars in job script.
Shouldn't the project name (bsub -P
) be set for the CSM field account
? We expect CSM job_name
to be the value of bsub -J
@mew2057 no, it was not even close, unfortunately. The user_script
file name should be: batchscript_IMB_test_nbc_000002n_1ppn.sh
and the job_name
should be IMB_test_nbc_000002n_1ppn
.
I checked the CSM DB and it shows the same incorrect values:
SELECT allocation_id,user_script,job_name FROM csm_allocation_history WHERE allocation_id=2272;
2272 | 1525381589.19121 | stf006accept
For account, LSF passes in the bsub -G
-P
For the script name, if you submit a job using bsub < script to redirect the job script to bsub stdin, LSF does not know the script name, the content of the script is redirected to bsub stdin, therefore, LSF uses the LSF job file name.
@jma562 what do you mean by LSF job file name? #BSUB -J
? If so, that is still not the correct value.
@verolero86 LSF job file name is a shell script LSF construct and runs, it contains the executable name you specified on bsub command line, or the content of the script that you redirect to bsub stdin.
The job file consists of
In case of bsub < script, the script name is not known by LSF but the content of the script is redirected to bsub stdin. The script may contains too many lines or special chars that CSM DB cannot handle, therefore, for bsub redirect job, LSF passes the job file name to the csm create API.
For job_type, the initial thought is that "batch" means that this allocation is created for a job from a batch scheduling system. "Interactive" means that this allocation is created by hand of running csm command. Therefore in LSF, the job_type field is left default (BATCH).
bsub -J is tricky, if not specified, it's user script name or content of the script. We may end up in the same issue of user_script unless -J is enforced or auto filled for each job.
@verolero86 and @mattaezell are you OK tracking this problem under LSF ?
@fpizzano yes, that is fine. should we open a new LSF PMR or is there an internal issue already being tracked?
@verolero86 Hi Veronica, please log an LSF PMR. Within IBM, there has been a ticket logged and the job name and job type issue have been fixed (available in next LSF release). The user_script has been "partially" fixed as it depends on enhancement for bsub -script.
I've verified that job_name
now matches the value passed to LSF. There is still one remaining issue where the user_script
is not found that we need to sort through but that is not CSM related, so this issue can be closed. Thanks!
CSM seems to show incorrect values for several fields. This is using: ibm-csm-hcdiag-1.0.0-9460.noarch ibm-csm-api-1.0.0-9460.ppc64le ibm-csm-core-1.0.0-9460.ppc64le