CCI-MOC / xdmod-cntr

A project to prototype the use of XDMOD with OpenStack and OpenShift on the MOC
1 stars 5 forks source link

kubernetes job to pull and process the hierarchy data #116

Closed rob-baron closed 1 year ago

rob-baron commented 1 year ago

Depends on how kristi and jim decided to provide the hierarchy data. Initially, it will be as a csv dump (https://github.com/nerc-project/coldfront-plugin-cloud/issues/89).

This is to automate the processing on xdmod

joachimweyl commented 1 year ago

@rob-baron please add progress, decisions and XDMoD responses for this issue.

rob-baron commented 1 year ago

Uses the following files for it's hierarchy:

1) hierarchy.json - this defines a 3 level hierarchy that xdmod uses. This json file is placed in the /etc/xdmod directory.

{
    "top_level_label": "Institution",
    "top_level_info": "Top Level Institution that has an MOU with MGHPCC regarding NERC services and billing",
    "middle_level_label": "School/Center",
    "middle_level_info": "Top divisional level of the institution, like College of Engineering, Medical School",
    "bottom_level_label": "Discipline",
    "bottom_level_info": "Reserch Discipline of the PI at the Institution that is responsible for the project"
}

for cloud based resources this is fine, for the openshift resources (as it is modeled after HPC) the bottom level should probably be

    "bottom_level_label": "PI_Discipline",
    "bottom_level_info": "PI name and Reserch Discipline of the PI at the Institution that is responsible for the project"

HPC doesn't have the concept of a project so this is the suggested method of mapping the PI name to an openshift project.

2) hierarchy.csv -

"BU","Boston University",""
"HU","Harvard University",""
"HARIRI-BU", "Rafik B. Hariri Institute for Computing and Computational Science & Engineering", "BU"
"Phys-BU","Physical Science","CAS"
"Phys-HU","Physical Science","FAS"
"CS-CoE-BU", "BU Computer Science", "BU"
"PS-CS-CoE-BU", "Physical Science", "CS-CoE-BU"
"PS-Phys-HU", "Physical Science", "Phys-HU"
"PS-Phys-BU", "Physical Science", "Phys-BU"

The first column needs to be unique.

This is processed via the command:

xdmod-import-csv -t hierarchy -i hierarchy.csv

3) groups.csv - maps a group to the institution or department in the hierarchy.csv file

"MOCA", "HARIRI-BU" "MOC", "HARIRI-BU" "Phys-Gr-BU", "Phys-BU" "Phys-Gr-HU", "Phys-HU"

This is processed via:

xdmod-import-csv -t group-to-hierarchy -i group.csv

4) names.csv - this gives a real name to the group or person.

"MOCA", "", "Mass Open Cloud Alliance"
"MOC",  "", "Mass Open Cloud"
"Phys-Gr-BU", "", "BU Physics Group"
"Phys-Gr-HU" "", "HU Physics Group"
"robbaron", "Robert", "Baron"
"mosayyeb@bu.edu", "A", "M"
"pjd@ccs.neu.edu", "Peter", "Desnoyers"

Processed via

xdmod-import-csv -t names -i names.csv

5) pi-to-cloud-project - associated a PI to a project in a cloud resource format:

<pi>,<project_name>,<resource_name>

Data:

"robbaron","moc-infrastructure","nerc_openstack"
"pjd@ccs.neu.edu","bu528-ceph-prefetch","nerc_openstack"

Processed via: xdmod-import-csv -t cloud-project-to-pi -i pi-to-cloud-project.csv

6) re-ingest For the cloud - since we are using the openstack data type, do the following:

xdmod-ingestor --datatype=openstack
xdmod-ingestor --aggregate=cloud --last-modified-start-date 2012-01-01

for openshift it will be similar to:

xdmod-ingestor --last-modified-start-date 2012-01-01
joachimweyl commented 1 year ago

data will be provided here.

msdisme commented 1 year ago

@rob-baron this came up in weekly call- can you provide an update on this?

rob-baron commented 1 year ago

The research part of this was split to (https://github.com/CCI-MOC/xdmod-cntr/issues/144).

To summarize:

For cloud resources (openstack): 1) have been able to associate PIs to Projects 2) Have NOT been able to associate Projects with either groups or the hierarchy - have a query into xdmod about that. 3) Have a couple of things to try in the meanwhile

For HPC resources (openshift): 1) have been able to associate groups to projects (in the HPC view PI and group are synonyms).

The main implication is that the hierarchy as defined should probably be adjusted. The current hierarchy is as follows

        1) Institution (ie, BU, HU, NEU ... )
        2) School/Center/intitute (FAS, CFA, CAS ... )
        3) Research Discipline
   With groups, identifying the PI in the groups.csv

   can adjust name of group using names.csv

We could adjust the heiarachy to keep all relevant information - perhaps:

  ```
  1) institution - school (ie BU-CAS, HU-FAS ... )
  2) PI
  3) Research Discipline

  using groups/projects to track the specific grant
  ```

This will fit both HPC and cloud for the current version of xdmod. Further more this fits my model of:

  1. institutions having multiple PIs,
  2. PIs having 1 or more Research Disciplines
  3. PIs applying for multiple research grants

I expect these things will change as the current mechanisms in xdmod are not the most convent to work with.

msdisme commented 1 year ago

@rob-baron any updates?

rob-baron commented 1 year ago

partially blocked on (https://github.com/nerc-project/coldfront-plugin-cloud/issues/89)

rob-baron commented 1 year ago

1) I have gotten a new dump of the infra cluster database and placed it on ocp-staging xdmod and xdmod-staging projects.

2) am creating a json structure based on the data that was loaded on to the 2 xdmod projects on ocp-staging. This will serve as an intermediate data structure for the data that will be fetched from cold fromt. I have updated cold fromt with an example of that structure.

joachimweyl commented 1 year ago

NERC will need to be able to change these at some point. Field of study list from Scott & Wayne

Life Science, Physical Science, Statistics/Math, Social Science, Arts/Humanities, Business/Management, Computer Science/Engineering, Public Health/BioMedical, Other

Probably makes the most sense to just have this completely done prior to it coming into XDMoD so XDMoD can just process based on what it gets from ColdFront and RegApp.

joachimweyl commented 1 year ago

@rob-baron can you provide an update here? You can probably copy-paste a fair amount from your stand-up notes.

rob-baron commented 1 year ago

The field of study will be coming from reg app though key cloak/keystone, as any changes to these fields only need to be applied once upstream.

For auditing purposes, we will need to keep track of the changes to them.

Am planning to create a simpler version that will at least track all of the history information.

rob-baron commented 1 year ago

Basic questions:

How auditable do we want xdmod to be?

1) how far back do we want show back to be accurate?

2) Does it matter if we have projects that fall under the "Unknown" institution? (for example an inactive project)

joachimweyl commented 1 year ago

@waygil & @syockel please review the questions above.

joachimweyl commented 1 year ago

@rob-baron are all the pieces you wanted to finish before the HTTP Endpoint was provided done? Are you now fully blocked on the HTTP endpoint? For example, is the double PI issue resolved? Is the double PI issue being tracked here or is there a separate GitHub issue for that?

rob-baron commented 1 year ago

@joachimweyl

1) I am working on generating the files based on faked data. Kristi has only included a partial record that I have left a comment on. I am hoping that he chooses a more generic record for the transfer of data (something that cold-front could adopt) with the mapping of Nerc/NERC/OpenStack/OpenShift specific fields to that generic record.

2) I believe you haven't understood what the double PI issue is. Strictly speaking, if we have the PI on the low tier of the hierarchy file, then there is no double PI issue. However, if we put the PI on the mapping table (the group.csv) that maps groups (synonymous with PI based on xdmid's definition). This enables 1 PI to be mapped to 2 different projects from 2 different institutions.

If we transfer the data as a strict hierarchy (something that I am suggesting as I include a parent_id that refers to an id and I include a type of record to indicate hierarchy level (makes it easy to check if something of one type is connected to a specified typed (ie, field-of-science is only connected to an institution, and a PI is connected to a field of science), with the project being connected to a PI. Furthermore, this will be enforced even if we have to rearrange the PIs and Projects.

Basically, it is something we need to check depending on the data that is transferred from cold-front.

rob-baron commented 1 year ago

So, to be explicit my plan of attack has been to generate the afore mentioned hierarchy files (see comment on 30-Jan on this issue) via some mocked up data in a format as in my comment on (see comment on 16-MAR on https://github.com/nerc-project/coldfront-plugin-cloud/issues/89 and see comment on 3-APR on https://github.com/nerc-project/coldfront-plugin-cloud/pull/93).

I have generated the first set of files with a sensible hierarchy - that is:

Institution: field of science (PI's field of science) PI Project

Am currently working on generalizing the script to generate the set of files with the "flipped" hierarchy - that is

Institution: field of science (PI's field of science) Project PI

Still waiting to hear back from the xdmod team (https://help.xdmod.org/support/tickets/31911), but this is a low priority as flipping the hierarchy will also work - it is just not the most intuitive way to view the hierarchy as PIs tend to have multiple projects. Furthermore, the field of science is for the PI and not the project

I expect that the download from cold-front will be the the more intuitive hierarchy.

rob-baron commented 1 year ago

So cold-front https://github.com/nerc-project/coldfront-plugin-cloud/issues/89 only keeps track of the mapping between projects and PIs.

Will have to pull the hierarchy of Institution, field-of-science(for the PI), and PI from keycloak.

joachimweyl commented 1 year ago

The last step is to massage the data coming from Keycloak and ColdFront so that it matches the format @rob-baron already created for XDMoD to pull in.

joachimweyl commented 1 year ago

@rob-baron please provide an update on how things are going and your thoughts for next steps.

rob-baron commented 1 year ago

I have integrated the script that kristi provided and will be testing it with the credentials from vault, so the next steps are to:

1) test using the credentials 2) confirm the data that keycloak is providing the correct data.

rob-baron commented 1 year ago

This has been deployed on the infra cluster. Unfortunately, it is not able to connect with coldfront or keycloak. However the same script is able to connect from my development system to both.

Using curl on my development system and on the infra cluster replicated this.

Here is the error from the infra cluster:

Using project "xdmod-staging" on server "https://api.nerc-ocp-infra.rc.fas.harvard.edu:6443/".
Rob@MacBook-Pro xdmod-ProcessHierarchy % oc --as s...n rsh xdmod-ui-864b4cb54d-fkhxj
Defaulted container "xdmod" out of: xdmod, xdmod-acl-work-a-round, xdmod-init-1 (init)
sh-4.2$ curl https://keycloak.mss.mghpcc.org/
curl: (7) Failed connect to keycloak.mss.mghpcc.org:443; Connection timed out
sh-4.2$ curl https://coldfront.mss.mghpcc.org/
curl: (7) Failed connect to coldfront.mss.mghpcc.org:443; Connection timed out
sh-4.2$

also in https://github.com/OCP-on-NERC/operations/issues/138

joachimweyl commented 1 year ago

@rob-baron can we add this as it's own issue to https://github.com/OCP-on-NERC/operations ?

joachimweyl commented 1 year ago

No longer blocked.

joachimweyl commented 1 year ago

PR merged and this is now closed.