IUSCA / sca-issues

1 stars 0 forks source link

HPC Everywhere: Job Script Generator comments #16

Closed CicadaDennis closed 5 years ago

CicadaDennis commented 5 years ago

Please see this https://github.com/IUSCA/sca-issues/issues/16#issuecomment-493568117 for updated feedback.

IGNORE THE FOLLOWING. Job Script Generator Comments

Names of queues are not limited by the choice of HPC system. It would be best if only queues that are available on a Compute Resrouce would be visible in the Queue drop down. That is, the choice of queues in the drop down would change depending on the Compute Resource value. Queue names for Karst and Carbonate are not clear which are which. No debug queue listed for Karst and/or Carbonate. Cores per node limit seems like it may be incorrect for some queues. For Carbonate, there is no queue with a 24 cores per node limit, which should exist for Carbonate. Should there be a queue for deep learning? Also with Carbonate queue, value of number of nodes should change depending on the vmem requested, so reflects number of large memory nodes, if large amount of memory is requested.

On vmem. the up arrow button does not work. There should be a restriction on vmem that is equal to the maximum that one could request for the particular Compute Resource/queue.

Buttons for email on begin, end, abort do not change values in PBS -m option. Instead it is always "abe".

debug_cpu and debug_gpu queues do not let me choose 1 hour as a walltime value. Only up to :59 mins. Similarly, a queue with a 7 day limit will not let me set 7, but only 6:23:59 Could instead, when 7 is selected, set max of other pull downs to zero.

Even though modules for load/unload are selected based on Compute Resource, if one changes compute resource, existing load/unload modules remain, although they may not exist on the now current compute resource. An error message could appear (like with the Job command Script messages) that indicate if there are modules listed to be loaded or unloaded which do not exist on the chosen resource. Hidden modules appear as available and are incorrectly listed without their periods. For example, on Big Red 2, typing abinit brings up three module choices, two of which do not actually exist: abinit/gnu/gpu/7.6.4 and abinit/gnu/cpu/8.0.8b There are files: abinit/gnu/gpu/.7.6.4 and abinit/gnu/cpu/.8.0.8b So these choices should not be listed as possible modules.

Module name list does not include personally defined modules (ones which might be included by a use command in one's .modules file, e.g. 'module use $HOME/my_modules'). It would be great if such modules could be included.

Hitting return when one is editing a text field should only terminate editing the field, but it is also causing the "Estimate Start" button at the bottom of the page to get pressed.

Validate/Submit should bring up a confirm window, but anyway, right now does nothing, which presumably is on purpose, since we are "just testing" the interface.

In the following warning messages: "The 'ccmrun' command is required when running in CCM execution environment. click for example" and "Use of $PBS_ARRAYID variable is required when submitting job arrays. click for example" When I clicked the 'click for example' links, nothing came up.

agopu commented 5 years ago

Thanks @CicadaDennis for the detailed feedback! We will look into it.

I do wonder however if a caching issue has caused the Javascript on your browser to not be up to date. We will check in with you about that.

agopu commented 5 years ago

@youngmd & I stopped by Cicada's desk. Turned out he was finding these issues on Research Desktop (RED - using the HPC everywhere icon). We are going to attempt to reproduce the issue(s) on RED ourselves and go from there -- it might be a Javascript engine issue on there with the Chromium browser.

Meanwhile, Cicada is going to redo his tests on HPCe within a web browser on his laptop.

Feel free to open a new ticket with your findings @CicadaDennis :-)

youngmd commented 5 years ago

@rperigo I think we need to update chromium-browser on RED. When running in application the following errors are generated and the script generator page does not function correctly:

Cookie sqlite error 2067, errno 0: UNIQUE constraint failed:

Potentially fixed according to this page, which says the cookie uniqueness constrains have changed: https://bugs.chromium.org/p/chromium/issues/detail?id=800414

Why only running in app mode generates this error is an open question.

CicadaDennis commented 5 years ago

So testing again from local browser rather than through RED: Still have some of the above issues: vmem up arrow not working

abe email buttons not setting -m values

max walltime values do not let one set the max value (1 hour or 7 days, etc.)

Module load/unload now gets rid of nonexistant modules when changing compute resource - good - but it also gets rid of existing modules. That is, it gets rid of all modules whether or not they exist. Would be good to leave ones that were still valid, if that is a possible thing to do.

Module load/unload still doesn't look for local/personally defined modules.

Hidden modules are still a problem.

Hitting return when editing a text field still invokes the "Estimate Start" button.

Additional comments: In the validation window, after clicking "Validate/Submit", one can change the computing resource, but the values in the job script were set based on a particular compute resource, so it seems a little strange to allow the user to change that at this point in the process. But it does give power to the user to change any part of the script now, so maybe this is ok? Does give the user the power to totally mess up the script, though there is the validate button.

Maybe could have a "Back" button at the bottom of the validation window (would just do same as closing the window, but functionally is clearer for the user).

Perhaps it would be good to have a field where the user could type in or browse to a "working directory" from which the script would run (and which would be set as the $PBS_O_HOME directory). And that could add a "cd $PBS_O_HOME" to the top of the script.

CicadaDennis commented 5 years ago

I see where the Validate/Submit Job window is being used as well with the MyHPC screen, from which choosing the Computing Resource makes sense. But maybe the "Submit Job" button should instead go to the Script Generator screen?

agopu commented 5 years ago

@CicadaDennis Mike has made a few updates based on your feedback which we will push to production while there are couple items we will likely be unable to change or possibly don't want to for specific reasons. We will have a chat about those.

Meanwhile we have updated the HPC everywhere icon on RED to not use kiosk mode, and we are hoping that fixes all of the RED-specific issues you were having.

youngmd commented 5 years ago

vmem up arrow not working

fixed

abe email buttons not setting -m values

fixed

max walltime values do not let one set the max value (1 hour or 7 days, etc.)

The other option (remove/block numbers above 0 when at maximum time) is more confusing for users. If someone really wants that extra minute they can always edit the script. The job validator will accept times that are equal to the maximum allowable.

Module load/unload now gets rid of nonexistant modules when changing compute resource - good - but it also gets rid of existing modules. That is, it gets rid of all modules whether or not they exist. Would be good to leave ones that were still valid, if that is a possible thing to do.

Not really feasible given the current setup. There is minimal overlap of modules based on exact version numbers between clusters.

Module load/unload still doesn't look for local/personally defined modules.

We do not have any access to the user's home directories

Hidden modules are still a problem.

This is a crud/cleanout issue. The tool that retrieves and updates the module database will start clearing out all entries instead of just appending new modules.

Hitting return when editing a text field still invokes the "Estimate Start" button.

Fixed

Additional comments:

In the validation window, after clicking "Validate/Submit", one can change the computing resource, but the values in the job script were set based on a particular compute resource, so it seems a little strange to allow the user to change that at this point in the process.

Locked out system select dropdown when users are coming from the script generator

Maybe could have a "Back" button at the bottom of the validation window (would just do same as closing the window, but functionally is clearer for the user).

Changed "Cancel" button to say "Close"

Perhaps it would be good to have a field where the user could type in or browse to a "working directory" from which the script would run (and which would be set as the $PBS_O_HOME directory). And that could add a "cd $PBS_O_HOME" to the top of the script.

Not possible under current setup.

CicadaDennis commented 5 years ago

Thanks!


From: Michael Young notifications@github.com Sent: Friday, May 24, 2019 3:37 PM To: IUSCA/sca-issues Cc: Dennis, H. E. Cicada Brokaw; Mention Subject: [External] Re: [IUSCA/sca-issues] HPC Everywhere: Job Script Generator comments (#16)

This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.

vmem up arrow not working fixed

abe email buttons not setting -m values fixed

max walltime values do not let one set the max value (1 hour or 7 days, etc.) will not fix. The other option (remove/block numbers above 0 when at maximum time) is more confusing for users. If someone really wants that extra minute they can always edit the script. The job validator will accept times that are equal to the maximum allowable.

Module load/unload now gets rid of nonexistant modules when changing compute resource - good - but it also gets rid of existing modules. That is, it gets rid of all modules whether or not they exist. Would be good to leave ones that were still valid, if that is a possible thing to do. Not really feasible given the current setup. There is minimal overlap of modules based on exact version numbers between clusters.

Module load/unload still doesn't look for local/personally defined modules. We do not have any access to the user's home directories

Hidden modules are still a problem. This is a crud/cleanout issue. The tool that retrieves and updates the module database will start clearing out all entries instead of just appending new modules.

Hitting return when editing a text field still invokes the "Estimate Start" button. Fixed

Additional comments:

In the validation window, after clicking "Validate/Submit", one can change the computing resource, but the values in the job script were set based on a particular compute resource, so it seems a little strange to allow the user to change that at this point in the process. Locked out system select dropdown when users are coming from the script generator

Maybe could have a "Back" button at the bottom of the validation window (would just do same as closing the window, but functionally is clearer for the user). Changed "Cancel" button to say "Close"

Perhaps it would be good to have a field where the user could type in or browse to a "working directory" from which the script would run (and which would be set as the $PBS_O_HOME directory). And that could add a "cd $PBS_O_HOME" to the top of the script. Not possible under current setup.

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/IUSCA/sca-issues/issues/16?email_source=notifications&email_token=AFJRG3WO6K473WVDMSGVCPDPXA7YLA5CNFSM4HNXPFT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWGMRCY#issuecomment-495765643, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFJRG3UHKAL5YNADD6GB7ZLPXA7YLANCNFSM4HNXPFTQ.