10up / classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence.
https://classifaiplugin.com
GNU General Public License v2.0
559 stars 52 forks source link

Amazon Polly as a provider for the text-to-speech feature. #734

Closed iamdharmesh closed 3 months ago

iamdharmesh commented 4 months ago

Description of the Change

PR Adds Amazon Polly as a provider for the Text-to-speech feature.

Closes #728

@jeffpaul @dkotter Amazon Polly provides additional options such as Newscaster Speaking Style and SSML, which offers features like including breathing sounds and emphasizing specific words or phrases. I haven't implemented it in this PR, but we can consider integrating it in the future if there are specific client requirements around these features.

@dkotter, I haven't added E2E tests for this because I haven't figured out yet how to mock the API, given that we are using the AWS PHP SDK here. Please let me know if you have any ideas on this.

How to test the Change

  1. Go to Tools > ClassifAI > Language Processing > Text to Speech.
  2. Select the "Amazon Polly" option as the provider.
  3. Add AWS credentials and save settings.
  4. Create/Edit a post and ensure that the Text-to-Speech feature is working as expected.

Changelog Entry

Added - Amazon Polly as a provider for the text-to-speech feature.

Credits

Props @jeffpaul @iamdharmesh

Checklist:

dkotter commented 3 months ago

@dkotter, I haven't added E2E tests for this because I haven't figured out yet how to mock the API, given that we are using the AWS PHP SDK here. Please let me know if you have any ideas on this.

I guess my first question would be do we need to use the SDK here? I know that can help simplify things but we haven't used any SDKs for the other Providers up to this point. I'm not opposed to it, just wondering if there was a specific reason.

But I can think of two approaches we can take to mock the requests:

  1. Add a short-circuit filter right before we make the request to AWS, allowing us to return our own results. This is basically what WordPress does, my only concern is we'd basically be adding a filter for testing purposes only which I don't love
  2. Because the main request goes through a custom REST endpoint, there is a filter there that fires before any callbacks are called: rest_pre_dispatch. We could use this and return a hardcoded result, similar to how we're currently using the pre_http_request filter. This wouldn't work for all scenarios (like triggering Text to Speech from the inline row action) but should work to test the main use case of publishing content
iamdharmesh commented 3 months ago

I guess my first question would be do we need to use the SDK here? I know that can help simplify things but we haven't used any SDKs for the other Providers up to this point. I'm not opposed to it, just wondering if there was a specific reason.

The main reason for using the SDK was to keep things simple, especially concerning signing and authenticating REST requests. I believe we don't have this complex authentication process with existing providers. I'm open to getting rid of the SDK here and writing a custom class for handling authentication and REST operations (similar to what we did for the OpenAPI). Please let me know if you think we should remove the SDK here.

I can think of two approaches we can take to mock the requests:

Approach #1 seems like the better choice to me as it allows us to cover all scenarios.

Thanks

dkotter commented 3 months ago

The main reason for using the SDK was to keep things simple, especially concerning signing and authenticating REST requests. I believe we don't have this complex authentication process with existing providers. I'm open to getting rid of the SDK here and writing a custom class for handling authentication and REST operations (similar to what we did for the OpenAPI). Please let me know if you think we should remove the SDK here.

I think we're fine to proceed with keeping the SDK here. It does increase the size of the final release zip due to all the code the SDK brings but that should be fine.

Approach #1 seems like the better choice to me as it allows us to cover all scenarios.

That works for me