androidx / media

Jetpack Media3 support libraries for media use cases, including ExoPlayer, an extensible media player for Android
https://developer.android.com/media/media3
Apache License 2.0
1.56k stars 373 forks source link

How to seek when using Android TTS #1680

Open padrecedano opened 2 weeks ago

padrecedano commented 2 weeks ago

I am trying to implement a text to speech reader using Android TTS and Media3.

This is a lot of text separated by the dot, as TTS does not support content with more than 4000 characters in a single reading. There is no problem in that regard, the reading is done correctly, but if I try to skip by pressing the seek button or on the progress bar the application fails throwing the following error:

java.lang.IllegalStateException: Missing implementation to handle one of the COMMANDSEEK* at androidx.media3.common.SimpleBasePlayer.handleSeek(SimpleBasePlayer.java:3366) at androidx.media3.common.SimpleBasePlayer.seekTo(SimpleBasePlayer.java:2366) at androidx.media3.common.BasePlayer.seekToCurrentItem(BasePlayer.java:481) at androidx.media3.common.BasePlayer.seekToOffset(BasePlayer.java:492) at androidx.media3.common.BasePlayer.seekBack(BasePlayer.java:142) at androidx.media3.ui.PlayerControlView$ComponentListener.onClick(PlayerControlView.java:1892) at android.view.View.performClick(View.java:7455) at android.view.View.performClickInternal(View.java:7432) at android.view.View.access$3700(View.java:835) at android.view.View$PerformClick.run(View.java:28810) at android.os.Handler.handleCallback(Handler.java:938) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loopOnce(Looper.java:201) at android.os.Looper.loop(Looper.java:288) at android.app.ActivityThread.main(ActivityThread.java:7870) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1003)

The error is this line in the SimpleBasePlayer file:

/* pendingOperation= */ handleSeek(mediaItemIndex, positionMs, seekCommand),

This is my code:

TTS

@UnstableApi
class TtsPlayerCompose(
    looper: Looper?, context: Context?,
    text: StringBuilder,
    splitRegex: String,
    private var mProgressListener: (Int, Int) -> Unit

)  : SimpleBasePlayer(looper!!), OnInitListener {
    private val mTexts: Array<String>
    private val mTts: TextToSpeech
    private var mTextProgress = 0
    private var mIsPlaying = false
    private var state = State.Builder()
        .setAvailableCommands(

            Player.Commands.Builder().addAll(
                COMMAND_PLAY_PAUSE,
                COMMAND_STOP,
                COMMAND_SEEK_BACK,
                COMMAND_SEEK_FORWARD,
                COMMAND_SET_SHUFFLE_MODE,
                COMMAND_GET_CURRENT_MEDIA_ITEM,
                COMMAND_GET_METADATA
            ).build()
        )
        .setPlayWhenReady(false, PLAY_WHEN_READY_CHANGE_REASON_USER_REQUEST)
        //.setAudioAttributes(PlaybackService.Companion.getDEFAULT_AUDIO_ATTRIBUTES())
        .setPlaylist(listOf<MediaItemData>(MediaItemData.Builder("test").build()))
        .setPlaylistMetadata(
            MediaMetadata.Builder().setMediaType(MediaMetadata.MEDIA_TYPE_PLAYLIST)
                .setTitle("TTS test").build()
        )
        .setCurrentMediaItemIndex(0)
        .build()

    init {
        mTexts = text.split(splitRegex.toRegex()).dropLastWhile { it.isEmpty() }.toTypedArray()
        mTts = TextToSpeech(context, this)
        mTts.setOnUtteranceProgressListener(object : UtteranceProgressListener() {
            override fun onDone(utteranceId: String) {
                if (!mIsPlaying || mTextProgress == mTexts.size) return
                ++mTextProgress
                speakText()
            }

            override fun onStart(utteranceId: String) {
                updatePlaybackState(STATE_READY, true)
            }
            @Deprecated("Deprecated in Java")
            override fun onError(utteranceId: String) {
                //onError(utteranceId);
            }
        })
    }

    private fun speakText() {
        if (mTextProgress >= mTexts.size) return
        //API 21+
        val bundle = Bundle()
        bundle.putInt(TextToSpeech.Engine.KEY_PARAM_STREAM, AudioManager.STREAM_MUSIC)
        for (name in mTexts) {
            //mTts.speak(mTexts[mTextProgress], TextToSpeech.QUEUE_FLUSH, bundle, "TTS_ID")
            mTts.speak(name, TextToSpeech.QUEUE_ADD, bundle, "TTS_ID")

        }
    }

    fun start() {
        mIsPlaying = true
        speakText()
    }

    fun resume() {
        mIsPlaying = false
        mTts.stop()
        start()
        updateProgress(mTextProgress, mTexts.size)
    }

    override fun getState(): State {
        return state
    }

    private fun updatePlaybackState(playbackState: Int, playWhenReady: Boolean) {
        val mainHandler = Handler(Looper.getMainLooper())
        mainHandler.post {
            state = state.buildUpon()
                .setPlaybackState(playbackState)
                .setPlayWhenReady(playWhenReady, PLAY_WHEN_READY_CHANGE_REASON_USER_REQUEST)
                .build()
            invalidateState()
        }
    }

    override fun handleSetPlayWhenReady(playWhenReady: Boolean): ListenableFuture<*> {
        if (playWhenReady) {
            val locSpanish = Locale("spa", "ESP")
            mTts.setLanguage(locSpanish)
            speakText()
        } else {
            mTts.stop()
        }
        return Futures.immediateVoidFuture()
    }

    override fun handleRelease(): ListenableFuture<*> {
        mTts.stop()
        mTts.shutdown()
        return Futures.immediateVoidFuture()
    }

    override fun handleStop(): ListenableFuture<*> {
        mTts.stop()
        return Futures.immediateVoidFuture()
    }

    override fun handleSetShuffleModeEnabled(shuffleModeEnabled: Boolean): ListenableFuture<*> {
        return Futures.immediateVoidFuture()
    }

    private fun updateProgress(current: Int, max: Int) {
        mProgressListener.invoke(current, max)
    }

    fun changeProgress(progress: Int) {
        mTextProgress = progress
        if (!mIsPlaying) return
        pause()
        start()
    }

    override fun onInit(status: Int) {
        if (status == TextToSpeech.SUCCESS) {
            val locSpanish = Locale("spa", "ESP")
            val result = mTts.setLanguage(locSpanish)
            if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                return
            }
            changeProgress(1)
        }
    }

    fun close() {
        mTts.stop()
        mTts.shutdown()
    }
}

ExoPlayerView

@UnstableApi
@Composable
fun ExoPlayerView(text: StringBuilder) {
    val context = LocalContext.current
    var sliderMaxValue by remember { mutableFloatStateOf(100f) }
    val ttsPlayer = TtsPlayerCompose(Looper.getMainLooper(), context, text, Constants.SEPARADOR){ current: Int, max: Int ->
    sliderMaxValue = max.toFloat()}

    DisposableEffect(Unit) {
        onDispose {
            ttsPlayer.release()
        }
    }

    // Use AndroidView to embed an Android View (PlayerView) into Compose
    AndroidView(
        factory = { ctx ->
            PlayerView(ctx).apply {
                player = ttsPlayer
            }
        },
        modifier = Modifier
            .fillMaxWidth()
            .height(200.dp)
    )
}

Service

open class DemoPlaybackService : MediaLibraryService() {

    private lateinit var mediaLibrarySession: MediaLibrarySession

    companion object {
        private const val NOTIFICATION_ID = 123
        private const val CHANNEL_ID = "demo_session_notification_channel_id"
    }

    open fun getSingleTopActivity(): PendingIntent? = null

    open fun getBackStackedActivity(): PendingIntent? = null

    protected open fun createLibrarySessionCallback(): MediaLibrarySession.Callback {
        return DemoMediaLibrarySessionCallback(this)
    }

    @OptIn(UnstableApi::class) 
    override fun onCreate() {
        super.onCreate()
        initializeSessionAndPlayer()
        setListener(MediaSessionServiceListener())
    }

    override fun onGetSession(controllerInfo: ControllerInfo): MediaLibrarySession {
        return mediaLibrarySession
    }
    @OptIn(UnstableApi::class)
    override fun onDestroy() {
        getBackStackedActivity()?.let { mediaLibrarySession.setSessionActivity(it) }
        mediaLibrarySession.release()
        mediaLibrarySession.player.release()
        clearListener()
        super.onDestroy()
    }

    @OptIn(UnstableApi::class)
    private fun initializeSessionAndPlayer() {
        val player = TtsPlayer(Looper.getMainLooper(), this, "")
        mediaLibrarySession =
            MediaLibrarySession.Builder(this, player, createLibrarySessionCallback())
                .also { builder -> getSingleTopActivity()?.let { builder.setSessionActivity(it) } }
                .build()
    }

    @OptIn(UnstableApi::class)
    private inner class MediaSessionServiceListener : Listener {

        override fun onForegroundServiceStartNotAllowedException() {
            if (
                Build.VERSION.SDK_INT >= 33 &&
                checkSelfPermission(Manifest.permission.POST_NOTIFICATIONS) !=
                PackageManager.PERMISSION_GRANTED
            ) {
                // Notification permission is required but not granted
                return
            }
            val notificationManagerCompat = NotificationManagerCompat.from(this@DemoPlaybackService)
            ensureNotificationChannel(notificationManagerCompat)
            val builder =
                NotificationCompat.Builder(this@DemoPlaybackService, CHANNEL_ID)
                    .setSmallIcon(R.drawable.ic_help)
                    .setContentTitle(getString(R.string.lbl_nona))
                    .setStyle(
                        NotificationCompat.BigTextStyle().bigText(getString(R.string.lbl_nona))
                    )
                    .setPriority(NotificationCompat.PRIORITY_DEFAULT)
                    .setAutoCancel(true)
                    .also { builder -> getBackStackedActivity()?.let { builder.setContentIntent(it) } }
            notificationManagerCompat.notify(NOTIFICATION_ID, builder.build())
        }
    }

    private fun ensureNotificationChannel(notificationManagerCompat: NotificationManagerCompat) {
        if (
            notificationManagerCompat.getNotificationChannel(CHANNEL_ID) != null
        ) {
            return
        }

        val channel =
            NotificationChannel(
                CHANNEL_ID,
                getString(R.string.lbl_mixto),
                NotificationManager.IMPORTANCE_DEFAULT,
            )
        notificationManagerCompat.createNotificationChannel(channel)
    }
}
marcbaechinger commented 1 week ago

The error message comes from your subclass of SimpleBasePlayer that requires you to override handleSeek. If you don't override this you can't either not call a seek command, or the error you posted above is raised.

padrecedano commented 1 week ago

@marcbaechinger

Is there any example implementation for TTS-based content?

I found a handleSeek method here but I don't know what to do in the method to seek the TTS content.

I added this implementation at Gradle:

implementation 'androidx.media3:media3-transformer:1.4.1'

But I can't write the method without errors. I'm trying in class TtsPlayerCompose.

marcbaechinger commented 6 days ago

I don't think the transformer library or any Compose related classes are required or make it much simpler for you.

Is there any example implementation for TTS-based content?

Not that I am aware of this.

If you are using a SimpleBasePlayer then you have to implement this method yourself. I'm not clear why you are using a SimpleBasePlayer. I assume you are generating audio frames from text locally in your app and feed it to an AudioTrack? That would explain why you can't just use an ExoPlayer instance.

If this is the case then a seek operation would be to somehow seek in the audio frames that you are generating and have stored in memory. I don't think we would be able to provide you with general guidance how to achieve this, because this isn't trivial and very much depends on how you are playing the audio frames within the player.

If I'm wrong with this assumption and the TTS service is somewhere on a server, then I'd rather look into using a ProgressiveMediaSource that loads the media from the server and can be set and played with ExoPlayer. Being able to use ExoPlayer would greatly simplify your task of integrating with a session, because you could just build the session with ExoPlayer.

marcbaechinger commented 6 days ago

Another thought that I just had is that you can create a ByteArrayDataSource that you can use with a ProgressiveMediaSource. If you generate audio in a format supported by ProgressiveMediaSource you can put it into the ByteArrayDataSource that is used by the progressive media source.

What is the media format that your TTS service generates?

padrecedano commented 6 days ago

Thanks for your interest. I'm not looking for anything too complex. It's about taking large amounts of text and converting it to speech using Android TTS. Also having the possibility to pause, stop, resume the audio and the possibility to advance or rewind by clicking on a progress bar. This is the class I currently have, and it works with XML-based code, but I want to migrate my entire app to Jetpack Compose and I'm stuck at that point.

Android TTS has the ability to create an audio file from a text, but that is not what I am interested in, but rather playing the text directly, without creating an audio.

I'm using SimpleBasePlayer because the only example I found was based on that class.