Closed antonelepfl closed 4 years ago
Expected behavior
Try to submit a job in Piz-Daint using service account
Actual Behavior (please include screenshot if possible)
Optional infrastructural data (user, platform, browser, environment, ...)
Python 3, Collab
I'm not sure if it might be because it is the first time I use the service account with this user @lbologna @clupascu @roberto.smiriglia@pa.ibf.cnr.it
Hello Stefano, In the service account the runtime value must be specified in floating numbers (representing hours - here are example submissions in the documentation: https://humanbrainproject.github.io/hbp-bsp-service-account/api/user/jobs/job_submission.html. We will handle this exception in the code as well). In your case "20m" is equal to ~"0.3". Please let me know if you encounter any other problems.
Hi, I have tried with
payload = {
'command': 'ls -al',
'node_number': 1,
'core_number': 1,
'runtime': 0.3,
'title': 'test service account',
}
payload = {
'command': 'ls -al',
'node_number': "1",
'core_number': "1",
'runtime': "0.3",
'title': 'test service account',
}
(I know that after I'm doing the dumps so the result should be the same but just in case) and I get the same thing
Hi Stefano, The last issue is different from the other, even if the error code is the same, indeed the error occurred because you added the job file name, in the "content-description", but you didn't add the content. Now I just fixed this behavior and you can submit a job in pizdaint without add the file name (removing the "content-description" from the headers) and the file content. Let me know if it works
So it works the submission part but I have a couple of questions..
1) Before I was getting a message saying "Request incomplete. Parameters missing" I guess we should be more explicit about which parameter is missing because I was not able to figure this out.
2) I'm submitting a job that should output the result of '/bin/date' but it fails. a) I'm not able to know why it fails because if I ask for the files (to check the stdout, stderr I get empty array) b) If I try to access directly on the machine and see that job I have permission denied (because I'm entering the machine with the mapped user that is not the service account one)
I'm using this configuration to launch the job
curl --location --request POST 'https://bspsa.cineca.it/jobs/pizdaint/bsp_pizdaint_01/' \
--header 'Payload: {"command":"/bin/date","node_number":"1","runtime":"10.0","core_number":"12"}' \
--header 'Authorization: Bearer {{TOKEN}}
3) I would like to know if there is a way to upload multiple files when I launch a job.
So it works the submission part but I have a couple of questions..
1. Before I was getting a message saying "Request incomplete. Parameters missing" I guess we should be more explicit about which parameter is missing because I was not able to figure this out. 2. I'm submitting a job that should output the result of '/bin/date' but it fails. a) I'm not able to know why it fails because if I ask for the files (to check the stdout, stderr I get empty array) b) If I try to access directly on the machine and see that job I have permission denied (because I'm entering the machine with the mapped user that is not the service account one)
I'm using this configuration to launch the job
curl --location --request POST 'https://bspsa.cineca.it/jobs/pizdaint/bsp_pizdaint_01/' \ --header 'Payload: {"command":"/bin/date","node_number":"1","runtime":"10.0","core_number":"12"}' \ --header 'Authorization: Bearer {{TOKEN}}
3. I would like to know if there is a way to upload multiple files when I launch a job.
1) I guess you could add a message like "Request incomplete. Payload parameter in header missing" or something more descriptive. Also it is really confusing a payload on the header. 3) Let's discuss on https://github.com/cnr-ibf-pa/hbp-bsp-issues/issues/507
Hi, @rcsm17 I'm not able to retrieve any file using advance endpoint. I get 500.
Hi, @rcsm17 I'm not able to retrieve any file using advance endpoint. I get 500.
Now it works again
@antonelepfl any news ? Can we close this issue ?
I'm having this issue again
@antonelepfl I fixed the error, can you check if you get any other error ?
Thank you now it is back again
I'm getting a different issue this time. When I want to download a file
I added the example on Collab
The think that I see is that it has problems downloading the files with '.'(dot) in the name. I have this issue also with raster.png
but not with stdout
Hi @antonelepfl, I think this is the response and an error of the HPC
I don't think so, I was able to fetch the files from PizDaint directly and also I'm able to fetch the stdout
for instance
You was right, the problem was related to the data encode and now I handle it. Check it and let me know if any error occurs.
Yeah so far I see that is fixed. Thank you!
You're welcome!
I am not able to submit a job can you have a look?
As far as I know there is maintenance today at PizDaint
Ok thanks for the update.
@lbologna do we have any news on this? I think the Unicore in PizDaint needs to be restarted
Hello @antonelepfl,
just checking. I'll keep you posted.
@BerndSchuller Do you think you can restart it on PD and Jureca?
As you know there was a massive security issue on many HPC systems across Europe.I think Piz Daint is back online, but JSC's HPC systems are still offline. As to the Piz Daint UNICORE services, you'll need to contact Piz Daint admins -> primarily this would be Fabio Verzelloni
Thank you Bernd for the information. I contacted Fabio Verzelloni and this is fixed.
Expected behavior
Try to submit a job in Piz-Daint using service account
Actual Behavior (please include screenshot if possible)
Optional infrastructural data (user, platform, browser, environment, ...)
Python 3, Collab
I'm not sure if it might be because it is the first time I use the service account with this user @lbologna @clupascu @roberto.smiriglia@pa.ibf.cnr.it