cnr-ibf-pa / hbp-bsp-issues

Ticketing system for developers/testers and power users of the Brain Simulation Platform of the Human Brain Project
4 stars 0 forks source link

Service Account PizDaint - Server Error (500) #501

Closed antonelepfl closed 4 years ago

antonelepfl commented 4 years ago

Expected behavior

Try to submit a job in Piz-Daint using service account

Actual Behavior (please include screenshot if possible)

Screenshot 2020-01-10 at 10 28 24

Optional infrastructural data (user, platform, browser, environment, ...)

Python 3, Collab

I'm not sure if it might be because it is the first time I use the service account with this user @lbologna @clupascu @roberto.smiriglia@pa.ibf.cnr.it

rcsm17 commented 4 years ago

Expected behavior

Try to submit a job in Piz-Daint using service account

Actual Behavior (please include screenshot if possible)

Screenshot 2020-01-10 at 10 28 24

Optional infrastructural data (user, platform, browser, environment, ...)

Python 3, Collab

I'm not sure if it might be because it is the first time I use the service account with this user @lbologna @clupascu @roberto.smiriglia@pa.ibf.cnr.it

Hello Stefano, In the service account the runtime value must be specified in floating numbers (representing hours - here are example submissions in the documentation: https://humanbrainproject.github.io/hbp-bsp-service-account/api/user/jobs/job_submission.html. We will handle this exception in the code as well). In your case "20m" is equal to ~"0.3". Please let me know if you encounter any other problems.

antonelepfl commented 4 years ago

Hi, I have tried with

payload = {
    'command': 'ls -al',
    'node_number': 1,
    'core_number': 1,
    'runtime': 0.3,
    'title': 'test service account',
}
payload = {
    'command': 'ls -al',
    'node_number': "1",
    'core_number': "1",
    'runtime': "0.3",
    'title': 'test service account',
}

(I know that after I'm doing the dumps so the result should be the same but just in case) and I get the same thing

rcsm17 commented 4 years ago

Hi Stefano, The last issue is different from the other, even if the error code is the same, indeed the error occurred because you added the job file name, in the "content-description", but you didn't add the content. Now I just fixed this behavior and you can submit a job in pizdaint without add the file name (removing the "content-description" from the headers) and the file content. Let me know if it works

antonelepfl commented 4 years ago

So it works the submission part but I have a couple of questions..

1) Before I was getting a message saying "Request incomplete. Parameters missing" I guess we should be more explicit about which parameter is missing because I was not able to figure this out.

2) I'm submitting a job that should output the result of '/bin/date' but it fails. a) I'm not able to know why it fails because if I ask for the files (to check the stdout, stderr I get empty array) b) If I try to access directly on the machine and see that job I have permission denied (because I'm entering the machine with the mapped user that is not the service account one)

I'm using this configuration to launch the job

curl --location --request POST 'https://bspsa.cineca.it/jobs/pizdaint/bsp_pizdaint_01/' \
--header 'Payload: {"command":"/bin/date","node_number":"1","runtime":"10.0","core_number":"12"}' \
--header 'Authorization: Bearer {{TOKEN}}

3) I would like to know if there is a way to upload multiple files when I launch a job.

rcsm17 commented 4 years ago

So it works the submission part but I have a couple of questions..

1. Before I was getting a message saying "Request incomplete. Parameters missing" I guess we should be more explicit about which parameter is missing because I was not able to figure this out.

2. I'm submitting a job that should output the result of '/bin/date'  but it fails.
   a) I'm not able to know why it fails because if I ask for the files (to check the stdout, stderr I get empty array)
   b) If I try to access directly on the machine and see that job I have permission denied (because I'm entering the machine with the mapped user that is not the service account one)

I'm using this configuration to launch the job

curl --location --request POST 'https://bspsa.cineca.it/jobs/pizdaint/bsp_pizdaint_01/' \
--header 'Payload: {"command":"/bin/date","node_number":"1","runtime":"10.0","core_number":"12"}' \
--header 'Authorization: Bearer {{TOKEN}}
3. I would like to know if there is a way to upload multiple files when I launch a job.
  1. The "Request incomplete. Parameters missing" message occurred when the Service Account doesn't find the payload.
  2. There was a bug, you can try again.
  3. In theory it's not possible to submit a job with multiple files because you can upload only a single file per job right now. But I think that there is a trick to do that. My idea is: 1) prepare the multipliple files so you can run them sequentially with a single command. 2) submit all file with different jobs and every job'command must be like "mv file-n.zip ./mycustomjob" 3) run a job without file and with the command to run all the job's files 4) try to fecth the results, maybe you need another job ?
antonelepfl commented 4 years ago

1) I guess you could add a message like "Request incomplete. Payload parameter in header missing" or something more descriptive. Also it is really confusing a payload on the header. 3) Let's discuss on https://github.com/cnr-ibf-pa/hbp-bsp-issues/issues/507

antonelepfl commented 4 years ago

Hi, @rcsm17 I'm not able to retrieve any file using advance endpoint. I get 500.

rcsm17 commented 4 years ago

Hi, @rcsm17 I'm not able to retrieve any file using advance endpoint. I get 500.

Now it works again

rcsm17 commented 4 years ago

@antonelepfl any news ? Can we close this issue ?

antonelepfl commented 4 years ago

I'm having this issue again

rcsm17 commented 4 years ago

@antonelepfl I fixed the error, can you check if you get any other error ?

antonelepfl commented 4 years ago

Thank you now it is back again

antonelepfl commented 4 years ago

I'm getting a different issue this time. When I want to download a file

Screenshot 2020-03-13 at 12 02 16

I added the example on Collab

The think that I see is that it has problems downloading the files with '.'(dot) in the name. I have this issue also with raster.png but not with stdout

rcsm17 commented 4 years ago

Hi @antonelepfl, I think this is the response and an error of the HPC

antonelepfl commented 4 years ago

I don't think so, I was able to fetch the files from PizDaint directly and also I'm able to fetch the stdout for instance

rcsm17 commented 4 years ago

You was right, the problem was related to the data encode and now I handle it. Check it and let me know if any error occurs.

antonelepfl commented 4 years ago

Yeah so far I see that is fixed. Thank you!

rcsm17 commented 4 years ago

You're welcome!

antonelepfl commented 4 years ago

I am not able to submit a job can you have a look?

alex4200 commented 4 years ago

As far as I know there is maintenance today at PizDaint

antonelepfl commented 4 years ago

Ok thanks for the update.

antonelepfl commented 4 years ago

@lbologna do we have any news on this? I think the Unicore in PizDaint needs to be restarted

lbologna commented 4 years ago

Hello @antonelepfl,

just checking. I'll keep you posted.

antonelepfl commented 4 years ago

@BerndSchuller Do you think you can restart it on PD and Jureca?

BerndSchuller commented 4 years ago

As you know there was a massive security issue on many HPC systems across Europe.I think Piz Daint is back online, but JSC's HPC systems are still offline. As to the Piz Daint UNICORE services, you'll need to contact Piz Daint admins -> primarily this would be Fabio Verzelloni

antonelepfl commented 4 years ago

Thank you Bernd for the information. I contacted Fabio Verzelloni and this is fixed.