CCI-MOC / xdmod-cntr

A project to prototype the use of XDMOD with OpenStack and OpenShift on the MOC
1 stars 6 forks source link

XDMoD showing PIs as Unknown for OpenShift #192

Closed joachimweyl closed 11 months ago

joachimweyl commented 1 year ago

CPU Hours: Total by PI is an example of where the PI is showing as "Unknown" image

@tzumainn place the group/pi information in with the pod information. We pick up the group information from ColdFront and associate it with the PI information from ColdFront. Comparing the list from ColdFront with the list that @rob-baron has from the database table that Mainn’s files populate, there are none that are listed on both. Part of this problem is that @Milstein & @joachimweyl created their projects in Staging ColdFront. This will not be an issue for future billing and data but for testing purposes we do need projects created in the Production ColdFront @Milstein do we have any OpenShift Pods running that are in Projects created in Production ColdFront?

joachimweyl commented 1 year ago

@rob-baron I am noticing the Projects section in JObs shows Projects and some of the project names are email addresses which I assume would be the PI for that project for example Milson has a project with his email address. Since the Project name comes from ColdFront wouldn't the PI come from there as well? Why are there no viable PIs to fill the PI with?

rob-baron commented 1 year ago

@joachimweyl, I have had to shorten the "Nerc Projects" to just the OpenShift/OpenStack project name as sometimes the combined Coldfront project name and OpenShift/OpenStack project name is too long for xdmod.

So here is the list of project names coming from cold-front (processed into the groups.csv file):

"Ephys VM-f08aaa1", "17"
"Calculate Distances to SDoH-Applicable Locations-fc2a12e", "27"
"AngiMLCode-f143cb3", "22"
"CaDrA II-f2d6357", "30"
"Research Computing Cloud Desktop Experiment-f5a9707", "35"
"Software & Application Innovation Lab (SAIL) Projects-f253eba", "39"
"Operend Core-ffbb214", "45"
"Software & Application Innovation Lab (SAIL) Projects-f4771b4", "39"
"SignLab-fe3d8c9", "39"
"Newspaper Database-fa2de75", "51"
"Harvard Neutrinos-fa72a29", "55"
"OCT FPGA projects-f73e76b", "58"
"Test project-fc67249", "61"
"Synthesizing symbolic & data-driven approaches-fa2485a", "66"
"eMap-f691cf6", "70"
"Storage research-f218147", "58"
"FAVOR-f7ba650", "76"
"Ceph Device Telemetry-feba27e", "81"
"ShiftStack-fe880db", "81"
"rh-openshift-ci-f0b4e3a", "81"
"Collaborative Loop Invariant Discovery-f5fccc3", "88"
"Theory and practice for Deep Learning-fffea4f", "93"
"Storage research-f8915c3", "58"
"Infant segmentation pipeline-f68c9c1", "99"
"Serverless storage systems (Migration from MOC)-f7cad26", "103"
"WRF Modeling-fcd0746", "107"
"Software & Application Innovation Lab (SAIL) Projects-fee2f20", "39"
"bioBakery development-fd8d067", "112"
"Software & Application Innovation Lab (SAIL) Projects-f262ab6", "39"
"Testing NERC for BU-f778c4d", "118"
"BU Cloud Computing Course-f5a13df", "121"
"Spark Infra-f11fd7d", "124"
"Testing NERC for BU-fff6a31", "118"
"A new state abstraction for serverless computing.-f840489", "103"
"Fitness activity recognition from video-fc77646", "131"
"GIS Data Science/Big data Projects at CGA-f201401", "134"
"net-test-feb-7-f3a3779", "138"
"Migrating from MOC to NERC-fa886c1", "141"
"Test-f06c1d8", "144"
"Hosting of Medical Image Analysis Platform-ffe44c0", "148"
"test-f2023bc", "151"
"NERC Elastic Cluster-f891e62", "156"
"Test Project-f69dcff", "138"
"Test project from UMass-f7955cc", "138"
"URI Center for Computational Research-f355d61", "165"
"UMass-URI Gravity Research Consortium-fde8724", "165"
"RCS Applications Team NERC Project-f45b4bd", "118"
"net-test-feb-7-f76ba00", "138"
"net-test-feb-7", "138"
"openshift-evaluation-for-fas-informatics-f580f4b", "173"
"OpenShift evaluation for FAS Informatics", "173"
"OpenShift evaluation for FAS Informatics-f045954", "173"
"software-application-innovation-lab-sail-projects-f946c6a", "39"
"Software & Application Innovation Lab (SAIL) Projects", "39"
"ML Practicum-f700fda", "81"
"synthesizing-symbolic-data-driven-approaches-f53ad5b", "66"
"Synthesizing symbolic & data-driven approaches", "66"
"ai4cloudops-f7f10d9", "121"
"AI4CloudOps", "121"
"AI4CloudOps-fffbac7", "121"
"software-application-innovation-lab-sail-projects-fcd6dfa", "39"
"Software & Application Innovation Lab (SAIL) Projects", "39"
"smart-village-faeeb6c", "185"
"Smart Village", "185"
"theory-and-practice-for-deep-learning-f16a451", "93"
"Theory and practice for Deep Learning", "93"
"Install & Launch Mascot Search Engine for MassSpec-fd569d5", "188"
"openshift-sandbox-f986996", "191"
"OpenShift sandbox", "191"
"LDAP experimentation-f66cb50", "191"
"BU IS&T / IT Partners-f345845", "195"
"HM's Brain-5b8b66", "198"
"MSoM Platform-1e370c", "202"
"IQSS RC Infrastructure for Social Scientists-f37c66c", "205"
"hosting-of-medical-image-analysis-platform-dcb83b", "148"
"Hosting of Medical Image Analysis Platform", "148"
"Controls on ground surface deformation in earthquakes-f080cba", "209"
"EEComp-a4b525", "708"
"eecomp-7ed0a6", "708"
"EEComp", "708"

This list is how xdmod maps from the groups/PI (Nerc Project). The group list in xdmod is populated from what is shredded and ingested, and so from this list:

select distinct groupname from hpcdb_jobs;
+--------------------------------------------------+
| groupname                                        |
+--------------------------------------------------+
| openshift-etcd                                   |
| openshift-apiserver                              |
| openshift-oauth-apiserver                        |
| openshift-storage                                |
| openshift-monitoring                             |
| openshift-machine-api                            |
| openshift-marketplace                            |
| openshift-kube-apiserver                         |
| openshift-kube-scheduler                         |
| openshift-multus                                 |
| external-secrets-operator                        |
| group-sync-operator                              |
| metallb-system                                   |
| open-cluster-management-addon-observability      |
| open-cluster-management-agent                    |
| open-cluster-management-agent-addon              |
| openshift-apiserver-operator                     |
| openshift-authentication                         |
| openshift-authentication-operator                |
| openshift-cloud-controller-manager-operator      |
| openshift-cloud-credential-operator              |
| openshift-cluster-machine-approver               |
| openshift-cluster-node-tuning-operator           |
| openshift-cluster-samples-operator               |
| openshift-cluster-storage-operator               |
| openshift-cluster-version                        |
| openshift-config-operator                        |
| openshift-console                                |
| openshift-console-operator                       |
| openshift-controller-manager                     |
| openshift-controller-manager-operator            |
| openshift-dns                                    |
| openshift-dns-operator                           |
| openshift-etcd-operator                          |
| openshift-image-registry                         |
| openshift-ingress                                |
| openshift-ingress-canary                         |
| openshift-ingress-operator                       |
| openshift-insights                               |
| openshift-kni-infra                              |
| openshift-kube-apiserver-operator                |
| openshift-kube-controller-manager                |
| openshift-kube-controller-manager-operator       |
| openshift-kube-scheduler-operator                |
| openshift-kube-storage-version-migrator          |
| openshift-kube-storage-version-migrator-operator |
| openshift-logging                                |
| openshift-machine-config-operator                |
| openshift-network-diagnostics                    |
| openshift-network-operator                       |
| openshift-nmstate                                |
| openshift-operator-lifecycle-manager             |
| openshift-sdn                                    |
| openshift-service-ca                             |
| openshift-service-ca-operator                    |
| patch-operator                                   |
| 2                                                |
| m*i@f*.h*.edu                        |
| n*s@f*.h*.edu                           |
| lars-sandbox                                     |
| 2b461470edb545b1a6e5131db3cb1a1b                 |
| 593ee1d6c7fa442f8a33c1e3942bb73d                 |
| e7ffe261658d422489ee13d6d6206fb9                 |
| gatekeeper-system                                |
| h*@r*.com                              |
| j*@b*.edu                                  |
| c*@b*.edu                                       |
| w*@b*.edu                                    |
| naved-test                                       |
| naved-testing                                    |
| aap                                              |
| openshift-operators                              |
| r*@w*.edu                             |
+--------------------------------------------------+
73 rows in set (13.99 sec)

Which, in this case, is coming from OpenShift, and has Milson's email address in it.

Furthermore, since there are no matches between these 2 lists, it makes sense that there are no PIs listed.

XDMoD may mix the notion of PI/Group Name/Project Name, but I would like to more rigidly separate them. For example, I don't want pull data from xdmod's database to get the list of project names found within, use rules to determine if it is the project name, or PI or group name, and populate the correct part of the hierarchy files. Considering the current process, adding these extra rules doesn't seem to give any bang for the buck

rob-baron commented 1 year ago

Bottom line: This behavior is expected (see my previous comment for more details).

joachimweyl commented 1 year ago

Thank you for the clarification. What are your thoughts on getting PI into XDMoD?

rob-baron commented 1 year ago

@joachimweyl,

What are you trying to accomplish?

Our current onboarding process is to have regapp populate the metadata in keycloak, and use coldfront to allocate the projects in openshift and openstack.

If you look at the names of the openshift projects the overwhelming majority of them are for running the openshift cluster. Should these have a PI associated with them. The few that are not listed, were created when cold-front was being tested.

PIs are working on the OpenStack side.

If you are just looking to test, create an openshift project (you are PI after all), start a test project.

Ideally, your project shows up in xdmod with a PI, and which is a nice confirmation that everything is being processed.

However, if it does not appear in the interface we can check the jobs table.

If it is in the jobs table, and it is listed the same way in cold-front as in the jobs table, than the hierarchy process is broken.

However if it is not in the jobs table, or listed incorrectly in the jobs table, then the processing on the openshift side is broken. If it doesn't appear, we check the jobs table, if it is in the jobs table and is found in the cold-front, then it is the hierarchy process that needs to be fixed.

joachimweyl commented 1 year ago

@Milstein since I am unable to get access to my new OpenShift request. Would you please make sure you have a project in OpenShift that was created in the new ColdFront not the staging ColdFront, and then create a Pod that we can confirm is showing up in XDMoD.

joachimweyl commented 1 year ago

@tzumainn or @naved001 how much work would it be to get the code for OpenShift to pull the namespace instead of the pi "name" (email)?

rob-baron commented 1 year ago

@tzumainn,

We need the openshift name space in the PI name for xdmod. The rationale is as follows:

We defined the xdmod hierarchy (it is only 3 levels) as the following:

    Institution
    +-> Field of science
         +-> PI

When this is defined for xdmod, it presents the 3 levels the same as the above.

There is also a group "level" that xdmod will present as a "PI" name, which gives us 2 PI levels. We have renamed the lower PI level to "NERC Project" and place either the OpenShift or the OpenStack namespace in it.

In the processing of the hierarchy file, we use the namespace to match with the allocated project name in cold front, which is mapped to the cold front project name and the PI name. Once we have the PI name we can then map it to the field of science and institution via the meta data in keycloak.

rob-baron commented 11 months ago

This has been solved with PR (https://github.com/CCI-MOC/xdmod-cntr/pull/157)