code-kern-ai / bricks

Open-source natural language enrichments at your fingertips.
Apache License 2.0
452 stars 23 forks source link

[MODULE] - Whisper brick #185

Open jhoetter opened 1 year ago

jhoetter commented 1 year ago

Please describe the module you would like to add to bricks Whisper article; wouldn't it be amazing if i could have one brick transferring speech to text, and then another one chained to detect the emotion/sentiment/entities from the text?

Do you already have an implementation? Whisper is open-source

Additional context See discussion #182; this is not pure JSON, but would be some format of audio data

LeonardPuettmann commented 1 year ago

At some point TTS/ STT is a must for us. We might not even need whisper. The azure cognitive services for example are also really great and dirt cheap, too.

martinnormark commented 1 year ago

How do you see the dependency of bricks towards different compute requirements or platforms?

I assume there could be multiple offerings for Speech to Text, one would use Azure Cognitive Services, one could be Whisper running on Hugging Face Inference endpoints and another one could be offered via AWS.

Do you have a manifest or something similar to express these dependencies in bricks?

(HF Inference endpoints for Whisper seems price competitive if you look at investigation by @philschmid here: https://www.philschmid.de/whisper-inference-endpoints - look at the bottom half).

jhoetter commented 1 year ago

100% agree, there should be multiple options, and Whisper should be one of them.

We're still trying to figure out how to best structure dependencies. If you have an idea or example of how you think such a manifest could look like, I'd be really interested in that!

Our rule of thumb is going to be: anything that can be self-hosted should be available as a brick that runs on your own machine; if it's available via API, then there will be some brick just coordinating the calls and using a user-provided token. This is where we differentiate "python functions" and "premium" (in the web interface)

martinnormark commented 1 year ago

I suppose the get_config, README, code_snippet does some of it. Looking at the GPT-3 summariser it has e.g. state, issue ID etc:

def get_config():
    return build_generator_premium_config(
        function=gpt3_tldr_summarization,
        input_example=INPUT_EXAMPLE,
        data_type="text",
        issue_id=195,
        state=State.PUBLIC
    )

Are you deriving these tags from the folder structure (premiums folder drives the Premium tag)?

image

For example progressive web apps, the browser needs some information about the app, examples of how it looks, how to run it, what it needs to run it: https://web.dev/add-manifest/

An advantage being that you can index manifest files without running the code, you could also offer a more declarative API plugin where you don't need to write any Python code but express the endpoints and inputs in a JSON manifest.

Not sure what fits your plans best, I was only curious in the context of contributing a brick that would depend on a specific compute resource or API and was looking for a way to express this in a meta-data kind of structure.

Such meta-data structure could look like:

{
    "name": "GPT-3 tl;dr; summarization",
    "description": "GPT-3 model which can be used to summarise text inputs.",
    "category": "generators",
    "dataType": "text",
    "issueId": 195,
    "state": "PUBLIC",
    "tier": "Premium",
    "dependencies": {
        "inputs": [
            {
                "name": "API_KEY",
                "label": "OpenAI API Key",
                "required": true,
                "type": "text",
                "description": "An API key for the OpenAI API. Can be provided by us or be obtained directly from OpenAI"
            },
            {
                "name": "model",
                "label": "GPT-3 model to use",
                "required": false,
                "default": "text-davinci-003",
                "type": "text",
                "description": "The model offered by OpenAI to use. Defaults to `text-davinci-003`."
            }
        ],
        "resources": [
            {
                "name": "openai",
                "label": "OpenAI API access",
                "type": "api",
                "description": "You need access to the OpenAI API to use this generator. This is subject to usage based charges by OpenAI which can be found here: https://beta.openai.com/pricing",
                "required": true
            }
        ]
    }
}

I'm not familiar with the integration path in Refinery, perhaps model is input at the same level as prompt.

Inputs and resources in this example bleeds a bit into one another. The dependency on OpenAI is implied with he API key and can be communicated there. It could look different if it was a requirement on AWS Recognition service. The resources would ask for AWS subscription details, IAM roles etc, and the inputs would be specific to the service you want to use.

But it probably needs more thinking to design this right 😄

LeonardPuettmann commented 1 year ago

Thank you, Martin! That's some excellent input. Yes, the tags are derived from the folder structure, meaning that premium functions are stored in a different location than the python functions that do not require an API key.

Currently, most of the information relevant to our CMS is stored in the config.py file, while we provide relevant information for the actual endpoint in the __init__.py file (which can contain placeholders for API keys or resource locations). For example, our brick module using the Azure cognitive services contains information like this:

from pydantic import BaseModel
from typing import List
import requests, uuid

INPUT_EXAMPLE = {
    "text": "Hallo, guten Tag.",
    "fromLang": ["de"],
    "toLang": ["en"],
    "apiKey": "<api-key-goes-here>",
    "region": "northeurope"
    }

class MicrosoftTranslatorModel(BaseModel):
    text: str
    fromLang: List[str]
    toLang: List[str]
    apiKey: str
    region: str

    class Config:
        schema_extra = {"example": INPUT_EXAMPLE}

Although combining all of this into a single meta-data structure like you suggested might also make sense for us at some point.

You said that you would like to contribute a brick that would depend on a specific compute resource/ API. Could you go into further detail on what brick you would like to build and what requirements such a module would have? Perhaps we could support this by providing a fitting structure for such a brick module.

jhoetter commented 1 year ago

Hi @martinnormark, quickly wanted to loop back just to let you know that we haven't forgotten about your suggestion. It makes perfect sense and we'll get back to it as soon as possible. Currently, we're preparing a couple of major updates for our company that are crucial to our business model - and for the time being, we need to focus on it. Just to be fair and transparent :)