fermilab-accelerator-ai / workflow

Machinery to pull data, wrangle data, keep it all running
5 stars 3 forks source link

Automated data archival workflow support #2

Closed gnperdue closed 4 years ago

gnperdue commented 4 years ago

From @jasonstjohn -

I just submitted a request to have a special kerberos principal "gmpsai-prod" created on the accelagpvm01 machine.  That's the entity which should be running data archiving/reformatting cron jobs, and maybe doing other stuff (batch jobs for ongoing training, when/if that rises to the level of a regular, automated activity).

Along with getting the right people on the k5login for gmpsai-prod (not accelai-prod?) we need to also be sure we have the scripts in place to manage archival work.

jasonstjohn commented 4 years ago

This is only the first step (of about three) to have a group account which can submit jobs, etc. from the virtual machine. (h/t Ed Simmonds who enlightened me and kickstarted the further effort)

So far we have REQ000000407997: RITM0889187: Single or bulk requests for UID or GIDs. (Stage: Request Approved) in progress

2) Andy Romero (Storage Network Services) to create a home directory for the new group user (needs GID from above)

3) Contact Distributed Computing and have them add the UID/GID to FERRY for accelai. (Joe Boyd) Needs need the path to the home directory, above). Once Joe's folks add it to FERRY, it will propagate out to the server.

jasonstjohn commented 4 years ago

Update: Andy Romero created home directory /nashome/g/gmpsai-prod created .k5login file set owner of .k5login to UID=56206 set permissions on .k5login to 600 added principal stjohn@FNAL.GOV to .k5login

Joe Boyd then says, Should show up in the passwd file on the machine but it may not be until Monday at this point.

[boyd@dawg service_notes]$ curl -s --cacert /etc/grid-security/certificates/InCommon-IGTF-Server-CA.pem "https://ferry.fnal.gov:8443/getPasswdFile?unitname=accelai&resourcename=accelai" | python -m json.tool | grep gmpsai "homedir": "/nashome/g/gmpsai-prod", "username": "gmpsai-prod"

(But likely 1 hour, which is any minute now, according to Ed Simmonds.)

gnperdue commented 4 years ago

Currently automated version using group account is broken, but we're working on it...

jasonstjohn commented 4 years ago

This is working now. Took a lot of work from many people in SCD; ticket were flying. We are using gmpsai-prod@accelaigpvm01.fnal.gov

Contact me to be added to the .k5login if you have work to do as this group account.

gnperdue commented 4 years ago

@jasonstjohn Awesome, so this covers automated data logging for the "high latency data" era, then? - by which I mean the period we are relying on data from ACNET as opposed to the board and its low-latency feeds?

jasonstjohn commented 4 years ago

Yes indeed.

On Wed, Nov 13, 2019 at 9:38 AM Gabriel Perdue notifications@github.com wrote:

@jasonstjohn https://github.com/jasonstjohn Awesome, so this covers automated data logging for the "high latency data" era, then? - by which I mean the period we are relying on data from ACNET as opposed to the board and its low-latency feeds?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fermilab-accelerator-ai/workflow/issues/2?email_source=notifications&email_token=ABVJ67KLN3WCYAKF5365HZLQTQNOFA5CNFSM4JBRIDA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6RRDA#issuecomment-553457804, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVJ67KDNJ3HNYK4EZABVPTQTQNOFANCNFSM4JBRIDAQ .

gnperdue commented 4 years ago

Okay, closing for now. We can re-open or write a different issue once low-latency data streams start to enter the picture.