Closed dsschult closed 7 years ago
Potential Solution: Run Minio in a Docker container at IceCube. Minio acts as an open source version of S3. A Glidein site would be provided with a key and secret that they could put into their configuration file.
When submit is run on the client a signed POST URL is generated using the minio python bindings. This URL is shipped as input with the job.
The glidein bash script would have a trap here that would tar up the startd logs and upload them to the minio server at IceCube.
At IceCube a second process on the pyglidein server would be started called log_importer. This would use the minio python bindings to watch activity on buckets. When a new file arrives, the service would download the file and use the ElasticSearch python bindings to do a bulk import of the data. A good example of this in action is here.
Note that you don't technically need the ElasticSearch python bindings if they become hard to work with. As an example, here is the inserter code from iceprod.
The team talked this afternoon after reviewing the the logging code. One idea that came up was to inject the URL of the uploaded log file into a classad that got shipped home.
Note for running multiple minio instances behind a reverse proxy: https://github.com/krishnasrinivas/cookbook/blob/68b6dab51f557ed437449104970abcf3bacf4b7b/docs/multi-tenancy-in-minio.md
My first attempt at this uses presigned put and get S3 urls generated by the client process at each grid site. Each site would have to add a [StartdLogging]
section to their configuration that includes three variables:
send_startd_logs
: This can be set to True or Falseurl
: The S3 endpoint URL. This can either be AWS or a Minio instance.bucket
: The name of the bucket that the log files should go to. I added a new client flag called --secrets
to the client command. It defaults to .pyglidein_secrets
if not set by the user. The file is configured the same way as the config file, but should only contain secrets. The reason for pulling secrets out of the configs is to ensure users don't push secrets to the pyglidein repo. When StartdLogging is enabled the secrets file should also contain a [StartdLogging]
section with these variables:
access_key
: S3 Access Keysecret_key
: S3 Secret KeyFor each job the client submits to a cluster, it generates a presigned put and get url. These are passed as environment variables to the job. A log_shipper
script is forked at job start time on the execute node that tars up the log directory and uploads the file to the S3 endpoint every five minutes. The glidein start script now respects SIGTERM and SIGINT. The condor process is killed and one more log shipment is run after receiving a SIGTERM or SIGINT from the scheduler.
A PRESIGNED_GET_URL
classad is injected into each glidein startd using the STARTD_ATTRS
expression. The classad can be accessed in the condor history file for debugging issues after a crash.
To create the IAM user, S3 Bucket, and Policy in AWS for shipping logs I created a cloudformation template that generates these resources. This template could be invoked for each site that wants to send logs. This ensures each site has its own set of credentials and permissions to write to a single S3 bucket in AWS. https://github.com/WIPACrepo/pyglidein/blob/logging/cloud_formations/logging_bucket.json The bucket life-cycle is set to delete files older than 90 days so the size of the bucket doesn't get out of control.
In the event of the site going away the entire cloudformation could be deleted causing all the resources that were created to be deleted as well.
We'd like to get the glidein logs:
and upload them to a central server.
I'd like to use HTTP PUT, maybe with basic authentication (could use the pool password if you wanted). That is simple enough that it should always work, without requiring cvmfs or anything installed on the worker node.