bystrogenomics / bystro

Natural Language Search and Analysis of High Dimensional Genomic Data
Mozilla Public License 2.0
44 stars 14 forks source link

[scheduler] Even more resilient perl beanstalkd annotation worker #545

Closed akotlar closed 4 months ago

akotlar commented 4 months ago

Example of new perl/bin/bystro-annotate.pl output:

(bystro) (base) ubuntu@ip-10-98-135-15:~/bystro/perl$ perl bin/bystro-annotate.pl --in ~/bystro/trio.trim.vep.vcf.gz --out test_tri/test --config ~/bystro/config/hg19.yml | jq
{
  "totalSkipped": 0,
  "totalProgress": 13341,
  "error": null,
  "results": {
    "annotation": "test.annotation.tsv",
    "config": "hg19.yml",
    "log": "test.annotation.log.txt",
    "sampleList": "test.sample_list",
    "dosageMatrixOutPath": "test.dosage.feather",
    "header": "test.annotation.header.json",
    "statistics": {
      "json": "test.statistics.json",
      "qc": "test.statistics.qc.tsv",
      "tab": "test.statistics.tsv"
    }
  }
}

Additional Background and Motivation:

  1. Previously, there was a small chance that operations on jobs like delete, release could stall the worker forever, if the beanstalkd server became unavailable in the microseconds between the job connection establishment and delete/release (we always connect before attempting those, so failure would have to occur between those two operations). Additionally, if the beanstalkd server became unavailable in the millisecond or so while those operations were running, the worker could stall.

    • The Python beanstalkd worker solves this because all operations on the socket respect the socket_timeout, whereas in Perl they do not and require us to implement client-side mechanisms
  2. Perl MCE (Many Cores Engine), at least in our hands, could not launch a 2nd parallel process for touching jobs (would conflict with launching the process pool for annotation), making it necessary to fork the annotation process anyway.

  3. By having the beanstalkd worker run bystro-annotate.pl, rather than calling Seq.pm directly, we ensure our command line interface gets use, and that we discover usability improvements, as we have done in this PR.

  4. Annotation jobs required long TTR (time to run) leases, because the annotation workers would not periodically touch the job to refresh the lease. If the client became unresponsive, say due to network outage, such that the job would not be completed or failed (or communication from the worker to beanstalkd server during delete/release operations failed), the job would only be retried after the TTR lease expired. Currently that lease is 48 hours. With this change jobs can run as long as needed, even with short TTRs, so that retrying the job after unresponsive client happened much faster (we could set the TTR to say 30 minutes).

akotlar commented 4 months ago

Since the main comment is busy:

  1. I've tested this extensively on bystro-dev, under a variety of workloads, and doing my best to break it, by using iptables to drop packets, restarting beanstalkd during submissions, using conntrack -D -p tcp --dport 11300 --timeout N to drop connections older than N seconds, etc. All works as expected. Workers, if they ever find themselves concurrently processing an annotation, which is only possible under issues of severe network fuckery, will result in the losing worker failing the job. It should also be possible under odd circumstances for the job to complete, but in either case, we will never get corruption from concurrent writes (unless FLOCK doesn't work on the filesystem). There is room for improvement here, but for now this is good enough for what should be an extreme corner case, and practically impossible with sufficiently long TTR

    • the only case we can have 2 workers try to simultaneously process a job is that the network goes down for long enough for the TTR lease of worker 1 to expire, causing worker 2 to pick up the job; it will take a short time for worker 1 to realize it no longer has the lease, and to kill the child task(s) that are doing the work requested in the job message.
    • currently worker 2 will see that the job is being processed, and fail the job
    • if worker 1 completed the job and sent the completion message before it realized it no longer had the lease, worker 2 would fail the job because it found the bystron_annotation.completed file. in the future we can relax this behavior, after verifying that all of the expected files are present (we can write the list of completed files, and their hashes in to bystron_annotation.completed, and have worker 2 verify that the files are present, have the expected contents, and send back a completed job with a qualification (e.g., put Duplicate submission, results verified and not modified in the message log for the job's BystroJobs DB record ).
  2. This is live on bystro-dev, with a very aggressive 40s TTR (to increase chance of race conditions for stress testing). Please try to submit jobs and break things.