Closed rob-baron closed 1 year ago
@rob-baron please add progress, decisions and XDMoD responses for this issue.
Uses the following files for it's hierarchy:
1) hierarchy.json - this defines a 3 level hierarchy that xdmod uses. This json file is placed in the /etc/xdmod directory.
{
"top_level_label": "Institution",
"top_level_info": "Top Level Institution that has an MOU with MGHPCC regarding NERC services and billing",
"middle_level_label": "School/Center",
"middle_level_info": "Top divisional level of the institution, like College of Engineering, Medical School",
"bottom_level_label": "Discipline",
"bottom_level_info": "Reserch Discipline of the PI at the Institution that is responsible for the project"
}
for cloud based resources this is fine, for the openshift resources (as it is modeled after HPC) the bottom level should probably be
"bottom_level_label": "PI_Discipline",
"bottom_level_info": "PI name and Reserch Discipline of the PI at the Institution that is responsible for the project"
HPC doesn't have the concept of a project so this is the suggested method of mapping the PI name to an openshift project.
2) hierarchy.csv -
"BU","Boston University",""
"HU","Harvard University",""
"HARIRI-BU", "Rafik B. Hariri Institute for Computing and Computational Science & Engineering", "BU"
"Phys-BU","Physical Science","CAS"
"Phys-HU","Physical Science","FAS"
"CS-CoE-BU", "BU Computer Science", "BU"
"PS-CS-CoE-BU", "Physical Science", "CS-CoE-BU"
"PS-Phys-HU", "Physical Science", "Phys-HU"
"PS-Phys-BU", "Physical Science", "Phys-BU"
The first column needs to be unique.
This is processed via the command:
xdmod-import-csv -t hierarchy -i hierarchy.csv
3) groups.csv - maps a group to the institution or department in the hierarchy.csv file
"MOCA", "HARIRI-BU" "MOC", "HARIRI-BU" "Phys-Gr-BU", "Phys-BU" "Phys-Gr-HU", "Phys-HU"
This is processed via:
xdmod-import-csv -t group-to-hierarchy -i group.csv
4) names.csv - this gives a real name to the group or person.
"MOCA", "", "Mass Open Cloud Alliance"
"MOC", "", "Mass Open Cloud"
"Phys-Gr-BU", "", "BU Physics Group"
"Phys-Gr-HU" "", "HU Physics Group"
"robbaron", "Robert", "Baron"
"mosayyeb@bu.edu", "A", "M"
"pjd@ccs.neu.edu", "Peter", "Desnoyers"
Processed via
xdmod-import-csv -t names -i names.csv
5) pi-to-cloud-project - associated a PI to a project in a cloud resource format:
<pi>,<project_name>,<resource_name>
Data:
"robbaron","moc-infrastructure","nerc_openstack"
"pjd@ccs.neu.edu","bu528-ceph-prefetch","nerc_openstack"
Processed via: xdmod-import-csv -t cloud-project-to-pi -i pi-to-cloud-project.csv
6) re-ingest For the cloud - since we are using the openstack data type, do the following:
xdmod-ingestor --datatype=openstack
xdmod-ingestor --aggregate=cloud --last-modified-start-date 2012-01-01
for openshift it will be similar to:
xdmod-ingestor --last-modified-start-date 2012-01-01
data will be provided here.
@rob-baron this came up in weekly call- can you provide an update on this?
The research part of this was split to (https://github.com/CCI-MOC/xdmod-cntr/issues/144).
To summarize:
For cloud resources (openstack): 1) have been able to associate PIs to Projects 2) Have NOT been able to associate Projects with either groups or the hierarchy - have a query into xdmod about that. 3) Have a couple of things to try in the meanwhile
For HPC resources (openshift): 1) have been able to associate groups to projects (in the HPC view PI and group are synonyms).
The main implication is that the hierarchy as defined should probably be adjusted. The current hierarchy is as follows
1) Institution (ie, BU, HU, NEU ... )
2) School/Center/intitute (FAS, CFA, CAS ... )
3) Research Discipline
With groups, identifying the PI in the groups.csv
can adjust name of group using names.csv
We could adjust the heiarachy to keep all relevant information - perhaps:
```
1) institution - school (ie BU-CAS, HU-FAS ... )
2) PI
3) Research Discipline
using groups/projects to track the specific grant
```
This will fit both HPC and cloud for the current version of xdmod. Further more this fits my model of:
I expect these things will change as the current mechanisms in xdmod are not the most convent to work with.
@rob-baron any updates?
partially blocked on (https://github.com/nerc-project/coldfront-plugin-cloud/issues/89)
1) I have gotten a new dump of the infra cluster database and placed it on ocp-staging xdmod and xdmod-staging projects.
2) am creating a json structure based on the data that was loaded on to the 2 xdmod projects on ocp-staging. This will serve as an intermediate data structure for the data that will be fetched from cold fromt. I have updated cold fromt with an example of that structure.
NERC will need to be able to change these at some point. Field of study list from Scott & Wayne
Life Science, Physical Science, Statistics/Math, Social Science, Arts/Humanities, Business/Management, Computer Science/Engineering, Public Health/BioMedical, Other
Probably makes the most sense to just have this completely done prior to it coming into XDMoD so XDMoD can just process based on what it gets from ColdFront and RegApp.
@rob-baron can you provide an update here? You can probably copy-paste a fair amount from your stand-up notes.
The field of study will be coming from reg app though key cloak/keystone, as any changes to these fields only need to be applied once upstream.
For auditing purposes, we will need to keep track of the changes to them.
Am planning to create a simpler version that will at least track all of the history information.
Basic questions:
How auditable do we want xdmod to be?
1) how far back do we want show back to be accurate?
2) Does it matter if we have projects that fall under the "Unknown" institution? (for example an inactive project)
@waygil & @syockel please review the questions above.
@rob-baron are all the pieces you wanted to finish before the HTTP Endpoint was provided done? Are you now fully blocked on the HTTP endpoint? For example, is the double PI issue resolved? Is the double PI issue being tracked here or is there a separate GitHub issue for that?
@joachimweyl
1) I am working on generating the files based on faked data. Kristi has only included a partial record that I have left a comment on. I am hoping that he chooses a more generic record for the transfer of data (something that cold-front could adopt) with the mapping of Nerc/NERC/OpenStack/OpenShift specific fields to that generic record.
2) I believe you haven't understood what the double PI issue is. Strictly speaking, if we have the PI on the low tier of the hierarchy file, then there is no double PI issue. However, if we put the PI on the mapping table (the group.csv) that maps groups (synonymous with PI based on xdmid's definition). This enables 1 PI to be mapped to 2 different projects from 2 different institutions.
If we transfer the data as a strict hierarchy (something that I am suggesting as I include a parent_id that refers to an id and I include a type of record to indicate hierarchy level (makes it easy to check if something of one type is connected to a specified typed (ie, field-of-science is only connected to an institution, and a PI is connected to a field of science), with the project being connected to a PI. Furthermore, this will be enforced even if we have to rearrange the PIs and Projects.
Basically, it is something we need to check depending on the data that is transferred from cold-front.
So, to be explicit my plan of attack has been to generate the afore mentioned hierarchy files (see comment on 30-Jan on this issue) via some mocked up data in a format as in my comment on (see comment on 16-MAR on https://github.com/nerc-project/coldfront-plugin-cloud/issues/89 and see comment on 3-APR on https://github.com/nerc-project/coldfront-plugin-cloud/pull/93).
I have generated the first set of files with a sensible hierarchy - that is:
Institution: field of science (PI's field of science) PI Project
Am currently working on generalizing the script to generate the set of files with the "flipped" hierarchy - that is
Institution: field of science (PI's field of science) Project PI
Still waiting to hear back from the xdmod team (https://help.xdmod.org/support/tickets/31911), but this is a low priority as flipping the hierarchy will also work - it is just not the most intuitive way to view the hierarchy as PIs tend to have multiple projects. Furthermore, the field of science is for the PI and not the project
I expect that the download from cold-front will be the the more intuitive hierarchy.
So cold-front https://github.com/nerc-project/coldfront-plugin-cloud/issues/89 only keeps track of the mapping between projects and PIs.
Will have to pull the hierarchy of Institution, field-of-science(for the PI), and PI from keycloak.
The last step is to massage the data coming from Keycloak and ColdFront so that it matches the format @rob-baron already created for XDMoD to pull in.
@rob-baron please provide an update on how things are going and your thoughts for next steps.
I have integrated the script that kristi provided and will be testing it with the credentials from vault, so the next steps are to:
1) test using the credentials 2) confirm the data that keycloak is providing the correct data.
This has been deployed on the infra cluster. Unfortunately, it is not able to connect with coldfront or keycloak. However the same script is able to connect from my development system to both.
Using curl on my development system and on the infra cluster replicated this.
Here is the error from the infra cluster:
Using project "xdmod-staging" on server "https://api.nerc-ocp-infra.rc.fas.harvard.edu:6443/".
Rob@MacBook-Pro xdmod-ProcessHierarchy % oc --as s...n rsh xdmod-ui-864b4cb54d-fkhxj
Defaulted container "xdmod" out of: xdmod, xdmod-acl-work-a-round, xdmod-init-1 (init)
sh-4.2$ curl https://keycloak.mss.mghpcc.org/
curl: (7) Failed connect to keycloak.mss.mghpcc.org:443; Connection timed out
sh-4.2$ curl https://coldfront.mss.mghpcc.org/
curl: (7) Failed connect to coldfront.mss.mghpcc.org:443; Connection timed out
sh-4.2$
also in https://github.com/OCP-on-NERC/operations/issues/138
@rob-baron can we add this as it's own issue to https://github.com/OCP-on-NERC/operations ?
No longer blocked.
PR merged and this is now closed.
Depends on how kristi and jim decided to provide the hierarchy data. Initially, it will be as a csv dump (https://github.com/nerc-project/coldfront-plugin-cloud/issues/89).
This is to automate the processing on xdmod