AWS Rekognition Integration

mickeypash commented 6 years ago

[ ] Investigate and make sure I understand the library
[ ] Investigate AWS Rekognition boto3 API
[ ] S3 support using boto3 AWS SDK

tyarkoni commented 6 years ago

Hi @mickeypash, thanks for your interest in taking this on! This is a good impetus to start laying out some guides for developers, so I'll use this issue as an opportunity to provide a fairly detailed overview, and will then work this up into a full doc section at some point. Feel free to ask questions if anything's unclear.

In general, we've tried to ease the pain of writing new Transformer classes by abstracting away most of the I/O stuff and letting developers focus almost exclusively on the internal transformation logic. In practice, what this means is that a feature Extractor minimally just has to fill out the following skeleton:


class MyNewExtractor(ImageExtractor):

    def _extract(self, stim):
        # Must return an instance of class ExtractorResult!
        ...

There's a pre-defined hierarchy of Transformer classes that you can inherit from if you're doing something fairly conventional. E.g., if you want to add a new Extractor that you know is only going to deal with images as inputs, you should inherit from ImageExtractor as above.

For Converter and Filter classes (see the docs for a detailed explanation of the differences), the same logic applies. Note that the method you implement changes based on the Transformer subclass. In the case of a Converter, you implement _convert(); in the case of a Filter, you implement _filter.

Note that the transformation method (i.e., _extract, _convert, or _filter) must not take any arguments other than stim. This means that any configuration you want to do must be done in the initializer (which there are no constraints on, so you can do whatever you like there). What the method returns is also constrained, and depends on the Transformer type. Extractor classes return instances of ExtractorResult (see existing Extractor classes for examples of how to initialize these). Converter classes return a Stim of a different type than the input. Filter classes return a Stim of the same type as the input.

Beyond these minimal requirements, there are a bunch of other conventions and utilities you can take advantage of to minimize work. For example:


class MyNewAPIExtractor(ImageExtractor):

    _log_attributes = ('param1', 'param2')
    _env_keys = (EXAMPLE_API_PARAM1, EXAMPLE_API_PARAM2)
    _version = '0.1'
    _batch_size = 100

    def __init__(self, param1, param2):
        self.param1 = param1
        self.param2 = param2

    def _extract(self, stim):
        # Must return an instance of class ExtractorResult!
        ...

The class attributes do some useful stuff for you (and also for the user):

_log_attributes indicates which parameters define a unique Transformer instance. Basically, any parameter names you include in the tuple will be used (a) to memoize the output of the Transformer (potentially saving users time and/or money) and (b) as part of the transformation history/log generated for each Stim(so that users know exactly how a Transformer was initialized). This means that you should include any parameter in this list that can affect the output of the _transform call.
_env_keys: This indicates which keys needed for API access to try to read out of the environment. This doesn't give you any extra functionality right now (though it should probably at least map those environment keys onto named variables, for convenience), and you're encouraged to accept the same variables as initialization arguments (see any of the existing API extractors for examples). But it should be added anyway for informational purposes.
_version: This is informal version tracking. Stable Transformer classes should be assigned 1.0; thereafter, major API-breaking changes should prompt a major version change, and minor improvements or very small breaking changes should prompt a minor version change. We're not currently enforcing this in any way, but that's the idea.
_batch_size: There's a BatchTransformerMixin class you can inherit from if you're writing a Transformer that is able to batch operations. See the docstring for further explanation. I imagine this won't be applicable for Amazon Rekognition given that you'll probably have to go through S3.

In addition to the transformation logic itself, you're also encouraged to implement a ._to_df() method in any Extractor classes you write. This is a method that should take an ExtractorResult as input, and return a pandas DataFrame as output. Then, the expectation is that you take the .raw attribute of that ExtractorResult (which should contain "raw" results retrieved from the feature extraction service), and process them into a nice DataFrame. There's more to be said about this, but I'm happy to provide more input once you get to that stage. It's not mandatory, as we very recently changed the internal API to work this way. But it would probably make sense to implement new Extractor classes this way. As a relatively simple example, you can take a look at pliers.extractors.image.FaceRecognitionFeatureExtractor and its subclasses.

Aside from that, it's really up to you how you want to implement support for Rekognition (or for any other service). Some general tips/suggestions:

We try to keep Transformer classes as modular as possible. For most of the major API services, it's possible to abstract a lot of the commonality into base classes. The GoogleAPITransformer hierarchy is the best example of this. We actually have a separate transformers.google module just because so many of the Google Cloud-based Converter and Extractor classes share functionality.
If all goes well, you shouldn't have to muck around with the core logic in pliers.transformers.base or pliers.extractors.base at all. But if you come across any problems that can't be resolved without rearchitecting some of the core logic, feel free to bring it up for discussion. This might be more likely for Rekognition than for other services given its additional requirements (i.e., to store the media files on AWS).
Transformer classes are generally organized by modality (e.g., pliers.transformers.audio, pliers.transformers.image, etc.), but in the case of major services, it's fine to bundle them all in a separate module. E.g., we have a pliers.transformers.google module, and the same will probably make sense for Rekognition.
Imports of optional dependencies should be wrapped in an attempt_to_import call at the top. Then, when you need to verify that the import exists, you call verify_dependencies. See the classes in pliers.extractors.api for many examples.

I'm probably forgetting things, but that's what comes to mind. Feel free to ask questions or bring up issues here as you run into them. Thanks!!

tyarkoni commented 5 years ago

On further inspection, I think this issue can be broken up into several steps. The Rekognition API supports a number of services that don't require S3. In boto3's Rekognition client, these include all of the detect_* methods (e.g., detect_faces()). These could probably be minimally implemented by putting most of the logic in a core AmazonRekognitionImageExtractor base class from which the face, label, text, etc. detectors inherit. Given that we can wrap boto3 for most requests (rather than querying the API directly), this will likely involve less work than the corresponding Google or Azure APIs.

Beyond those detection methods, we start to get into functionality that requires S3 support. The next step would probably be to extend the extractors created in the above step to accept S3 inputs. The easiest way to handle this would be to let the user just pass in their S3 credentials and bucket information either at Extractor initialization, or even in the transform call (since bucket may change from image to image). But it would be nicer to create an abstraction that lets users set up a fixed bucket for multiple calls.

Beyond that, we get into video-extraction territory. Here things get even more complex, because these extractors are asynchronous, so, as with the Google Cloud Video Intelligence services, we need to wait for the request to complete. Many of the video-based tools have the further complication that they work with collections stored on S3 (e.g., extracting faces from a video as they're encountered, so that they can be detected and tracked later on). So then we need to not only create collections, but pass them to the extractors as needed, and then, once completed, process the results into a form we can eventually return in to_df.

PsychoinformaticsLab / pliers

AWS Rekognition Integration #245