The Stirling Engine is the processing tool for Keeping History. It provides media conversion and compression, data analysis, metadata enhancement, stream packaging and archival, and is extensible through plugins.
This version is a non-working preview version. The previous 0.0.1 (untagged) release was a technology demo.
The entire Stirling Engine has been rewritten to be more modular and resilient for future added development. We’ve hopefully increased developer ergonomics as well.
By making only the source file as a required argument, we can use best guesses, sourced recommended defaults and calculated data to set our recommended settings. This makes the engine easier to use, meaning quicker start times. Our opinionated recommendations on settings will be expanded, and increased customization to core packages and default plugins will be added.
An example main.py file is provided to see how to use this application (see below).
Additions/Changes
Core
The core engine has been refactored into a core package. The core package includes:
args: a file intended to hold anything argument related. Currently, argunparser instances (which converts objects into CLI arguments) are provided here for common use. Later, a base argument argparser for converting CLI/JSON/API arguments to plugins will be added here, which will be implemented as a child class in each plugin.
audio: Stirling expects to work with either an audio or a video file; the audio package contains common functions for dealing with audio data from any multimedia source file. Currently, the audio plugin creates an archive-compatible audio file that can also be used by other plugins. ffmpeg is the backend transcoder.
definitions: all base classes and application-wide defintions are stored here. Currently, our base classes and definitions for video and audio transcoding are stored here.
helpers: common utility functions that can be used by the core package, or imported into plugins.
jobs: a StirlingJob class is the base for the entire lifecycle of a job; it’s the first thing you create, and everything is saved or called on it.
probe: a utility to probe an incoming media file. Currently, we use ffprobe, but will be adding functionality for mediainfo. The probe class returns a StirlingMediaInfo object, which is a basic, standardized class representing rudimentary data about a source file. If plugins require additional information about a media file, they should run their own custom probe. We will add additional data here as we standardize output from ffprobe and mediainfo. We also want to weigh adding codec or container specific data here.
pytorch: a holder for using the machine learning library Pytorch. Currently, it checks the system and returns an appropriate Pytorch instance for the system architecture/gpu.
video: like the audio package, video contains functions for transcoding video. Only the most common options are provided, as the video package is intended only to create an archive-ready video that is provided to other plugins for trasncoding, creating multiple formats and containers or for modifying video streams. ffmpeg is the backend.
Plugins
Plugins are a new-ish concept to Stirling Engine, and allows it to be infinitely expanded. We've moved some of our previous core code into plugins (such as transcript and hls). Plugins are now associated with jobs, as a StirlingJob as the top-level object (previously , we passed a single job to many plugins).
Plugins included are:
hls - for transcoding source video into an HLS VOD Package for streaming
objects - intended to do machine learning object detection using PyTorch. Not working, very old codebase that needs to be modified to look more like the more modern plugins.
peaks - generates waveform data in json for later processing
transcript - creates a speech-to-text transcript in JSON format. Uses the Google Speech-to-Text public API using a public API key. Other transcript backends are planned to be added, first and most notably OpenAI’s Whisper project because of its incredible accuracy (see https://openai.com/blog/whisper/ for more information).
Logging
Logging is important; it's difficult to handle every error that could come up when transcoding audio or video. We've improved our logging ability. We'll be adding verbosity levels later, as well as the ability to stream logs to a file (right now, there's a write buffer), as well as provide endpoints for streaming logs to an external program or service (such as Loggly or Google Cloud Logging).
main.py example file
The provided main.py shows the general flow for creating a job processor:
Create a StirlingJob object
Add plugins using the add_plugins() function
run() the job
close() the job (while this doesn’t do much now, it will be important as a datastore and also allow us to add metadata, like PBCore.org or https://www.loc.gov/standards/premis/). This will also help us in usage tracking when we’re able to open a beta service up in a limited access program.
PLEASE NOTE
The main.py file will NOT always be the way to build workflows; later on we'll add the ability to automatically load plugins, and it will receive a StirlingJob from another source like the planned ECU, a GraphQL input, a JSON file or the CLI.
We use VSCode, so we’ve provided our very limited .code-workspace file there. We’ll be adding more for developers in that file in the future, as well as provide other scripts and environment variables for non-users of VSCode.
Current issues:
None of the plugins seem to be working properly.
Some of the core packages seem to be working, but the arguments passed to ffmpeg need some work. The proof of concept, though is there!
Sometimes plugins seem to create the same command twice.
The topographical/priority sort of commands may be incorrect.
There are some test files, TODOs and comments that need to be removed after this most recent refactor.
Release 0.0.2
Release Notes:
This version is a non-working preview version. The previous 0.0.1 (untagged) release was a technology demo.
The entire Stirling Engine has been rewritten to be more modular and resilient for future added development. We’ve hopefully increased developer ergonomics as well.
By making only the
source
file as a required argument, we can use best guesses, sourced recommended defaults and calculated data to set our recommended settings. This makes the engine easier to use, meaning quicker start times. Our opinionated recommendations on settings will be expanded, and increased customization to core packages and default plugins will be added.An example
main.py
file is provided to see how to use this application (see below).Additions/Changes
Core
The core engine has been refactored into a core package. The
core
package includes:args
: a file intended to hold anything argument related. Currently,argunparser
instances (which converts objects into CLI arguments) are provided here for common use. Later, a base argumentargparser
for converting CLI/JSON/API arguments to plugins will be added here, which will be implemented as a child class in each plugin.audio
: Stirling expects to work with either an audio or a video file; the audio package contains common functions for dealing with audio data from any multimedia source file. Currently, the audio plugin creates an archive-compatible audio file that can also be used by other plugins.ffmpeg
is the backend transcoder.definitions
: all base classes and application-wide defintions are stored here. Currently, our base classes and definitions for video and audio transcoding are stored here.helpers
: common utility functions that can be used by the core package, or imported into plugins.jobs
: aStirlingJob
class is the base for the entire lifecycle of a job; it’s the first thing you create, and everything is saved or called on it.probe
: a utility to probe an incoming media file. Currently, we useffprobe
, but will be adding functionality formediainfo
. The probe class returns aStirlingMediaInfo
object, which is a basic, standardized class representing rudimentary data about a source file. If plugins require additional information about a media file, they should run their own custom probe. We will add additional data here as we standardize output fromffprobe
andmediainfo
. We also want to weigh adding codec or container specific data here.pytorch
: a holder for using the machine learning library Pytorch. Currently, it checks the system and returns an appropriate Pytorch instance for the system architecture/gpu.video
: like the audio package, video contains functions for transcoding video. Only the most common options are provided, as the video package is intended only to create an archive-ready video that is provided to other plugins for trasncoding, creating multiple formats and containers or for modifying video streams.ffmpeg
is the backend.Plugins
Plugins are a new-ish concept to Stirling Engine, and allows it to be infinitely expanded. We've moved some of our previous core code into plugins (such as
transcript
andhls
). Plugins are now associated with jobs, as aStirlingJob
as the top-level object (previously , we passed a single job to many plugins).Plugins included are:
hls
- for transcoding source video into an HLS VOD Package for streamingobjects
- intended to do machine learning object detection using PyTorch. Not working, very old codebase that needs to be modified to look more like the more modern plugins.peaks
- generates waveform data in json for later processingtranscript
- creates a speech-to-text transcript in JSON format. Uses the Google Speech-to-Text public API using a public API key. Other transcript backends are planned to be added, first and most notably OpenAI’s Whisper project because of its incredible accuracy (see https://openai.com/blog/whisper/ for more information).Logging
Logging is important; it's difficult to handle every error that could come up when transcoding audio or video. We've improved our logging ability. We'll be adding verbosity levels later, as well as the ability to stream logs to a file (right now, there's a write buffer), as well as provide endpoints for streaming logs to an external program or service (such as Loggly or Google Cloud Logging).
main.py
example fileThe provided
main.py
shows the general flow for creating a job processor:StirlingJob
objectPLEASE NOTE
The
main.py
file will NOT always be the way to build workflows; later on we'll add the ability to automatically load plugins, and it will receive aStirlingJob
from another source like the planned ECU, a GraphQL input, a JSON file or the CLI.Other additions
We’ve added some standard linting (we use Trunk https://trunk.io/products/check) and other config files.
We use VSCode, so we’ve provided our very limited
.code-workspace
file there. We’ll be adding more for developers in that file in the future, as well as provide other scripts and environment variables for non-users of VSCode.Current issues:
ffmpeg
need some work. The proof of concept, though is there!