idea: cron email system log to various users

mcampos-quinn commented 6 years ago

the system log (pymmlog) is saved to a location specified in the config file. it would be more helpful (visible) if it were emailed via cron on a weekly basis or something.

i think a simple cron job is outside of pymm itself (though mediamicroservices has an email function for SQL error reports) but i would at least include instructions here for my future reference.

otherwise i could adapt the email functions i used way back here and there could be various things emailed as necessary, and the admin could choose what to email and when.

... more config to config, but potentially helpful.

kieranjol commented 6 years ago

Ahhhhhh,I need to steal your general log function and config system. I just store the logs in the storage AIP right now,not sure if I should push them into a database them or not. I'm thinking yes..

On Wed, 30 May 2018, 18:38 Michael Campos-Quinn, notifications@github.com wrote:

the system log (pymmlog) is saved to a location specified in the config file. it would be more helpful (visible) if it were emailed via cron on a weekly basis or something.

i a simple cron job is outside of pymm itself (though mediamicroservices has an email function for SQL error reports) but i would at least include instructions here for my future reference.

otherwise i could adapt the email functions i used way back here https://github.com/BAM-PFA/pfa-library-journal-scraper/blob/master/scraper.py and there could be various things emailed as necessary, and the admin could choose what to email and when.

... more config to config, but potentially helpful.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BAM-PFA/pymm/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEyvjCof5dwUq8LEFJeVvhCCFAR9uXVks5t3tkcgaJpZM4UTtvb .

mcampos-quinn commented 6 years ago

... Yeah logging is something I've set up placeholders for but I'm at a point where I need to sit down and make it happen (can't avoid it for ever!). Currently there are three logging points:

a system log records basic details of an ingest process: start/stop, verification pass/fail, various ids, filenames, user, etc.
an ingest log records the same, but also more details about derivative creation, metadata creation, other PREMIS events; this gets stored with the AIP
the db I have set up is intended to record the PREMIS events we defined/are defining, along with object details for each ingest.

There's kind of a lot to consider, which is why I have been putting it off!

kieranjol commented 6 years ago

I like the way you've broken it up there.. not sure if I've ever sent you this blog I never finished https://docs.google.com/document/d/1s8RcCp0XPGFrzoG5NjpcOFrkkni1a-cK0-Ol3nHJYB8/edit?usp=drivesdk . I've a script in a separate branch that will "transcode" the AIP logs to PREMIS-y CSV, and another script that takes the csv and turns it into valid premis XML. Not that I've any use for the XML right now.the CSV would be easily ingested into a database though.i don't do any direct SQL stuff, maybe I should though?

mcampos-quinn commented 6 years ago

What I have set up now for a db in this system is taken basically straight from CUNY, which I kind of want to reconsider (partly due to reading your blog draft a couple months ago!). I would like to be able to go a bit deeper than the current structure allows (but not go full-on into object description--I actually like that the mm db just stores the mediainfo output as text in an object record). I think that a MySQL db makes sense for us over recoding PREMIS in XML mostly bc of usability going forward. I'd like to be able to leave this in a place that other staff can maintain and use, and mysql seems like a clean way of recording these complex relationships. XML files would basically just sit on storage and get ignored by our staff....

Even though I want to change the db layout some I think the access/write process for the db is fairly sound. Basically I just have a class that opens access to the database and a method to allow the user to pass a SQL query to INSERT/etc. Seems to work ok and should be secure enough.

I started documenting our PREMIS entities after I read your blog and swiftly abandoned it as it's the kind of thing that makes my head spin and I had other fires to put out. This is now among the last fires to put out before I launch this as our minimum viable product in production.

I think I saw that you are importing PBCore-fielded data to your technical database--are you also going to import PREMIS documentation via CSV?

kieranjol commented 6 years ago

Are you able to access your mysql databases from your regular database software (perhaps filemaker?)? Or would you have to use mysql or some custom helper interface to query that DB?

I'd love to discuss PREMIS with you if you get deeper into the implementation and want to discuss entities. I think the data dictionary assumes that there is some repository software creating the PREMIS events/agents, as it says things along the lines of 'file transfers are too trivial for premis as these are recorded in your system logs' - but I think they're not trivial at all! Or maybe they are I dunno :(

Yeah, we have actually gone live with CSV import - I think we have about 40 XDCAM EX packages imported via that CSV, and 40 concatenated matroska reproductions of those cards as well. It's been working fairly well. It's all linked up with the accessioning process as well.

I should probably not call that script makepbcore.py cos there's as many custom IFI legacy database fields that are produced as well as the pbCore - also our database has a character limit on field names so you get nasty things like 'essenceTrackEnco'.

Also yes, the plan originally was to have seperate agents and events tables. Some testing went well. I was having difficulty linking it all up though. I tried messing with mysql databases a bit but I just got lost. I think that I could easily just make database tables to document the events and agents, but it was querying that was proving troublesome for me. I can't even remember the issue now but I'm sure it would be a simple solution.

Anyhow the thing I worry about the most is that I have all these preservation events logged in the logfile in the AIP itself, but future events like fixity checks can't be logged there really, as the LTO tapes will be switched to read-only at that point, and it's best to just leave the packages on the tapes as static as possible. I didn't really think ahead on this one, but I think I should look very soon into a 'live' log living outside of the storage AIP that becomes the definitive log once the files have been written to LTO..

kieranjol commented 6 years ago

btw as to the topic - I think the weekly CRON reports are a great idea.

mcampos-quinn commented 6 years ago

Yeah I saw some of those truncated field names! Character limits are a drag. Cool that you have that up and running though.

Currently I'm saying the only access is via mysql, since I'm running up against our need to go live. Once I get this v1 launched I can look at other options for access. I have seen some pretty... less than ideal.. options, but since we're already using filemaker that's a realistic possibility.

Yeah, the idea was to have some redundancy in logging for varying use cases, with the database being authoritative (and yeah 'live' as opposed to the AIP logs).

Cheers! Yes, let's talk PREMIS!

BAM-PFA / pymm

idea: cron email system log to various users #8