Closed elwer closed 3 weeks ago
Just have seen: there is one possible parameter called "module_load" (e.g. "module_load": ["PyTorch/1.13.1"]
), while the above one is "modules_load". Is that intended or should they be unified?
The first issue should be solved, the 2nd one is intentional. modules_load allows you to load different modules, depending on the cluster host. The module_load does not allow that, it uses the modules listed there indiscriminately of the cluster.
Hopefully that solves this.
Now, in case we don't have any particular modules defined via module_load
or the new module_load_cluster
parameter, it adds "null" to the list of modules. Can we have a check in advance whether the parameters are defined and if not, skip it? "null" should be avoided.
Please see if its fixed in e1097b399adefad73a98055d5c9291c3b6063099
Kernel specific modules in the form:
are not reflected in the start-kernel.sh file