OSC / ondemand

Supercomputing. Seamlessly. Open, Interactive HPC Via the Web
https://openondemand.org/
MIT License
263 stars 99 forks source link

lmod browser #3402

Open johrstrom opened 3 months ago

johrstrom commented 3 months ago

Lot's of folks are interested in a page that can browse the modules available on a cluster.

We have this simple class - https://github.com/OSC/ondemand/blob/master/apps/dashboard/app/models/hpc_module.rb. And though it's not an ActiveModel yet, it could be I suppose.

It's not clear to me what one could do with this information inside of OOD, but it seems like they want

Xaraxia commented 3 months ago

I would be keen on such a thing. I'm thinking of creating a Passenger app to do exactly this. It would be amazing to have it part of the dashboard (a popup, perhaps, to allow people to browse and then select to fill in a form, but I'm brainstorming here and there might be problematic implications to doing that), but even as a fully-branded Passenger app it would be pretty fantastic.

The challenge is that different partitions potentially have different modules available. They're all in the one shared filesystem, but we alter the modulepath based on architecture. For example, we have AMD epyc3 and epyc4 and Intel Xeon nodes, and these can include a number of different GPUs with mixed architectures (NVIDIA [Hopper, Ampere, Ada] and AMD MI210s so far) and we compile separately for each one for reasons of performance and featureset. So for such a module to work, it would need to be able to filter by architecture. We also split into "local" (hand-built) and "auto" (EasyBuild, but Spack would apply here too). That's less of a problem, because the module path by architecture is known, but it does mean that you can't simply specify a single parent directory per architecture and expect it to work.

I'm not sure what you had in mind, but at the moment I'm looking to generate a JSON per architecture and allow people to select that and then display the module spider output in some sensible fashion. Potentially as a server-side app to avoid sending the entire spider file down to the end-user unnecessarily.

Anyway, I'm keen to help with this one.

Xaraxia commented 3 months ago

Oh, incidentally, use-case : because the architectures are all different, but the login nodes are one particular architecture, it's not possible to see the modules for the CUDA nodes from the login node. They are mostly the same, but not entirely, and this trips up a few of our users.

johrstrom commented 3 months ago

I would be keen on such a thing. I'm thinking of creating a Passenger app to do exactly this.

We would love the help! Though I have to say this project has so many moving parts separate Passenger apps are sort of untenable. So it would likely have to be a part of the dashboard.

The challenge is that different partitions potentially have different modules available.

We have ways of hiding or displaying things based off of other things. Like right now the feature for auto_modules already toggles on cluster, so only the modules for that cluster appear when you change clusters in the batch connect forms.

I'm not sure what you had in mind, but at the moment I'm looking to generate a JSON per architecture and allow people to select that and then display the module spider output in some sensible fashion.

If I were to do this today I'd probably start with an HTML table where every row is a module and maybe columns for different attributes about the module. I don't think the end user would want to interact with JSON, though an HTML table is just my off the top first idea, so there could be better UI choices.

Xaraxia commented 1 month ago

Yeah, agree that I wouldn't present the JSON to the end-user. It's just a nice way of storing it at the backend. Module by module works. I'll add it to my TODO list. I don't get a lot of time for OnDemand development, so might be a while, but if I get a chance I'll have a crack at it.

johrstrom commented 4 weeks ago

I don't get a lot of time for OnDemand development, so might be a while, but if I get a chance I'll have a crack at it.

OK - just watch this ticket in case we have any updates on our side. PRs are super welcome, so we'll take basically any addition you have even if it's just scaffolding for this feature.