benmcollins / libjwt

JWT C Library
Mozilla Public License 2.0
350 stars 164 forks source link

Setup libjwt and JWT auth on existing HPC Cluster installed from OpenHPC Repo generate errors #164

Closed fabianotex closed 2 years ago

fabianotex commented 2 years ago

Hello,

I do have an existing SLURM Cluster that was installed via OpenHPC repo. Munge is currently configured for auth today and I'm trying to setup JWT. I've followed the instructions (https://slurm.schedmd.com/jwt.html, including the JWT library build) however I keep getting errors while trying to restart "slurmctld":

error: Couldn't find the specified plugin name for auth/jwt looking at all files error: cannot find auth plugin for auth/jwt error: cannot create auth context for auth/jwt fatal: failed to initialize authentication plugin

Slurm libraries are located in "/usr/lib64/slurm". JWT library is located under "/usr/local/lib/".

I've tried to create a symlink to jwt library but it did not work.

What am I missing?

Any help is greatly appreciated.

Sincerely, fabianotex

benmcollins commented 2 years ago

I can try to help as much as possible, however, once you run make; sudo make install with LibJWT, then there's not much else on this end to do. The problem would really be in SLURM's auth/jwt module, and they might be able to help better.

First, can you do ls -l /usr/local/lib and ls -l /usr/lib64/slurm

I wonder why you've chose different locations when you built these things. I would think --prefix=/usr/local would have been used on both.

fabianotex commented 2 years ago

I did not choose the location. The installation was done via "dnf -y install ohpc-slurm-server" and everything was set from there. I checked the content of both folders, /usr/local/lib and /usr/lib64/slurm. Munge authentication library, for example is under /usr/lib64/slurm and JWT is under /usr/local/lib. I created a symlink in /usr/lib64/slurm for the JTW library, but that did not help. I guess I need to somehow recompile this installation. Thoughts? I will also ask slurm team about that.

benmcollins commented 2 years ago

I really need to see the directory listings. I really need to be able to know what's there so I can give you follow up instructions.

The only other alternative is to talk to the SLURM folks, since this is most likely a config or build issue with them. LibJWT does not control anything in SLURM, nor does it provide "auth/jwt" directly. It's merely a library used by the auth/jwt module that is built from the SLURM sources.

fabianotex commented 2 years ago

Here is the output:

[root@head ~]# ls -l /usr/local/lib total 292 -rw-r--r--. 1 root root 172272 Aug 29 17:08 libjwt.a -rwxr-xr-x. 1 root root 934 Aug 29 17:08 libjwt.la lrwxrwxrwx. 1 root root 15 Aug 29 17:08 libjwt.so -> libjwt.so.0.7.0 lrwxrwxrwx. 1 root root 15 Aug 29 17:08 libjwt.so.0 -> libjwt.so.0.7.0 -rwxr-xr-x. 1 root root 118576 Aug 29 17:08 libjwt.so.0.7.0 drwxr-xr-x. 2 root root 23 Aug 29 17:08 pkgconfig

[root@head ~]# ls -l /usr/lib64/slurm total 50296 -rwxr-xr-x. 1 root root 2589216 May 16 19:09 accounting_storage_mysql.so -rwxr-xr-x. 1 root root 279320 May 16 19:09 accounting_storage_none.so -rwxr-xr-x. 1 root root 630496 May 16 19:09 accounting_storage_slurmdbd.so -rwxr-xr-x. 1 root root 243064 May 16 19:09 acct_gather_energy_ibmaem.so -rwxr-xr-x. 1 root root 217888 May 16 19:09 acct_gather_energy_none.so -rwxr-xr-x. 1 root root 243120 May 16 19:09 acct_gather_energy_pm_counters.so -rwxr-xr-x. 1 root root 269816 May 16 19:09 acct_gather_energy_rapl.so -rwxr-xr-x. 1 root root 246408 May 16 19:09 acct_gather_filesystem_lustre.so -rwxr-xr-x. 1 root root 217368 May 16 19:09 acct_gather_filesystem_none.so -rwxr-xr-x. 1 root root 217432 May 16 19:09 acct_gather_interconnect_none.so -rwxr-xr-x. 1 root root 314656 May 16 19:09 acct_gather_profile_influxdb.so -rwxr-xr-x. 1 root root 236648 May 16 19:09 acct_gather_profile_none.so -rwxr-xr-x. 1 root root 250336 May 16 19:09 auth_munge.so -rwxr-xr-x. 1 root root 360168 May 16 19:09 burst_buffer_generic.so -rwxr-xr-x. 1 root root 381096 May 16 19:09 cli_filter_lua.so -rwxr-xr-x. 1 root root 239136 May 16 19:09 cli_filter_none.so -rwxr-xr-x. 1 root root 280104 May 16 19:09 cli_filter_syslog.so -rwxr-xr-x. 1 root root 287320 May 16 19:09 cli_filter_user_defaults.so -rwxr-xr-x. 1 root root 221088 May 16 19:09 core_spec_cray_aries.so -rwxr-xr-x. 1 root root 214928 May 16 19:09 core_spec_none.so -rwxr-xr-x. 1 root root 197576 May 16 19:09 cred_munge.so -rwxr-xr-x. 1 root root 174544 May 16 19:09 cred_none.so -rwxr-xr-x. 1 root root 245056 May 16 19:09 ext_sensors_none.so -rwxr-xr-x. 1 root root 230312 May 16 19:09 gpu_generic.so -rwxr-xr-x. 1 root root 340352 May 16 19:09 gres_gpu.so -rwxr-xr-x. 1 root root 288152 May 16 19:09 gres_mic.so -rwxr-xr-x. 1 root root 338848 May 16 19:09 gres_mps.so -rwxr-xr-x. 1 root root 288152 May 16 19:09 gres_nic.so -rwxr-xr-x. 1 root root 419344 May 16 19:09 jobacct_gather_cgroup.so -rwxr-xr-x. 1 root root 332104 May 16 19:09 jobacct_gather_linux.so -rwxr-xr-x. 1 root root 236888 May 16 19:09 jobacct_gather_none.so -rwxr-xr-x. 1 root root 316736 May 16 19:09 jobcomp_elasticsearch.so -rwxr-xr-x. 1 root root 280712 May 16 19:09 jobcomp_filetxt.so -rwxr-xr-x. 1 root root 314704 May 16 19:09 jobcomp_lua.so -rwxr-xr-x. 1 root root 469744 May 16 19:09 jobcomp_mysql.so -rwxr-xr-x. 1 root root 206064 May 16 19:09 jobcomp_none.so -rwxr-xr-x. 1 root root 250288 May 16 19:09 jobcomp_script.so -rwxr-xr-x. 1 root root 252904 May 16 19:09 job_container_cncu.so -rwxr-xr-x. 1 root root 217752 May 16 19:09 job_container_none.so -rwxr-xr-x. 1 root root 318144 May 16 19:09 job_container_tmpfs.so -rwxr-xr-x. 1 root root 225376 May 16 19:09 job_submit_all_partitions.so -rwxr-xr-x. 1 root root 225912 May 16 19:09 job_submit_cray_aries.so -rwxr-xr-x. 1 root root 401840 May 16 19:09 job_submit_lua.so -rwxr-xr-x. 1 root root 215112 May 16 19:09 job_submit_require_timelimit.so -rwxr-xr-x. 1 root root 240648 May 16 19:09 job_submit_throttle.so -rwxr-xr-x. 1 root root 351016 May 16 19:09 launch_slurm.so lrwxrwxrwx. 1 root root 30 Aug 29 17:37 libjwt.so -> /usr/local/lib/libjwt.so.0.7.0 lrwxrwxrwx. 1 root root 30 Aug 29 17:38 libjwt.so.0 -> /usr/local/lib/libjwt.so.0.7.0 -rwxr-xr-x. 1 root root 11352976 May 16 19:09 libslurmfull.so -rwxr-xr-x. 1 root root 11223632 May 16 19:09 libslurm_pmi.so -rwxr-xr-x. 1 root root 213224 May 16 19:09 mcs_account.so -rwxr-xr-x. 1 root root 220704 May 16 19:09 mcs_group.so -rwxr-xr-x. 1 root root 205304 May 16 19:09 mcs_none.so -rwxr-xr-x. 1 root root 211840 May 16 19:09 mcs_user.so -rwxr-xr-x. 1 root root 302584 May 16 19:09 mpi_cray_shasta.so -rwxr-xr-x. 1 root root 228232 May 16 19:09 mpi_none.so -rwxr-xr-x. 1 root root 938840 May 16 19:09 mpi_pmi2.so -rwxr-xr-x. 1 root root 350512 May 16 19:09 node_features_knl_generic.so -rwxr-xr-x. 1 root root 282328 May 16 19:09 power_none.so -rwxr-xr-x. 1 root root 206768 May 16 19:09 preempt_none.so -rwxr-xr-x. 1 root root 214112 May 16 19:09 preempt_partition_prio.so -rwxr-xr-x. 1 root root 209584 May 16 19:09 preempt_qos.so -rwxr-xr-x. 1 root root 304680 May 16 19:09 prep_script.so -rwxr-xr-x. 1 root root 222768 May 16 19:09 priority_basic.so -rwxr-xr-x. 1 root root 382520 May 16 19:09 priority_multifactor.so -rwxr-xr-x. 1 root root 216120 May 16 19:09 proctrack_cgroup.so -rwxr-xr-x. 1 root root 229736 May 16 19:09 proctrack_linuxproc.so -rwxr-xr-x. 1 root root 200784 May 16 19:09 proctrack_pgid.so -rwxr-xr-x. 1 root root 215544 May 16 19:09 route_default.so -rwxr-xr-x. 1 root root 231136 May 16 19:09 route_topology.so -rwxr-xr-x. 1 root root 463640 May 16 19:09 sched_backfill.so -rwxr-xr-x. 1 root root 264096 May 16 19:09 sched_builtin.so -rwxr-xr-x. 1 root root 208280 May 16 19:09 sched_hold.so -rwxr-xr-x. 1 root root 1028960 May 16 19:09 select_cons_res.so -rwxr-xr-x. 1 root root 1089744 May 16 19:09 select_cons_tres.so -rwxr-xr-x. 1 root root 511552 May 16 19:09 select_cray_aries.so -rwxr-xr-x. 1 root root 474336 May 16 19:09 select_linear.so -rwxr-xr-x. 1 root root 236224 May 16 19:09 site_factor_none.so -rwxr-xr-x. 1 root root 504720 May 16 19:09 slurmctld_nonstop.so drwxr-xr-x. 4 root root 33 Aug 26 18:45 src -rwxr-xr-x. 1 root root 434528 May 16 19:09 switch_cray_aries.so -rwxr-xr-x. 1 root root 247664 May 16 19:09 switch_none.so -rwxr-xr-x. 1 root root 500912 May 16 19:09 task_affinity.so -rwxr-xr-x. 1 root root 444528 May 16 19:09 task_cgroup.so -rwxr-xr-x. 1 root root 259432 May 16 19:09 task_cray_aries.so -rwxr-xr-x. 1 root root 236352 May 16 19:09 task_none.so -rwxr-xr-x. 1 root root 231920 May 16 19:09 topology_3d_torus.so -rwxr-xr-x. 1 root root 263520 May 16 19:09 topology_hypercube.so -rwxr-xr-x. 1 root root 174712 May 16 19:09 topology_none.so -rwxr-xr-x. 1 root root 248112 May 16 19:09 topology_tree.so

Thanks for your help.

Sincerely, fabianotex

benmcollins commented 2 years ago

There is no auth_jwt.so which would have been built as part of SLURM. Since you used the dnf -y install ohpc-slurm-server command to build SLURM, I can only imagine you didn't add the --with-jwt to the SLURM build, which explains what it's not there. You need to follow the instructions on building SLURM or figure out how to pass extra configure options to the dnf command.

benmcollins commented 2 years ago

Actually, it seems like you didn't build SLURM at all and it was simply installed pre-build from the DNF repo. You need to build it, not just install it.

fabianotex commented 2 years ago

Correct, that's what I mentioned in my first post in regards being installed from repo. I was trying to avoid rebuilding my SLURM cluster from scratch.

There are many ways to install SLURM and they are all good/valid, until you find out a situation like this. Today I have scripts to automate SLURM Cluster deployment that uses the "dnf" install, which is a fine method to me, but it is not flexible if I want to simply enable a new auth method.

In my opinion, product/application should be easier to be consumed by Cluster administrators. JWT should be part of SLURM from the beginning, having just to enable it or not in the slurm.conf file, and not recompiling the whole thing if I want to use it.

Hope that makes sese.

Sincerely, fabianotex

benmcollins commented 2 years ago

That's any issue you'll have to bring up with the SLURM developers. I have zero control over how they deploy their packages and product. I just provide a simple JWT library. I don't even provide the auth/jwt module itself, SLURM does.

Please, talk to them about this. LibJWT and SLURM are completely separate projects. LibJWT is not affiliated nor part of SLURM at all.

fabianotex commented 2 years ago

Ben,

Thanks a lot for your time. I completely understand you dont have control over how SLURM is deployed and that LibJWT and SLURM are completely separate things. I will reach out to them.

Sincerely, fabianotex