cdfmlr / muvtuber

Makes your AI vtuber
445 stars 75 forks source link

Cant get back audio TTS from Azure (audioview) #44

Closed ilNikk closed 1 year ago

ilNikk commented 1 year ago

Hello, I can't get audio. ChatGPT answers me correctly. I can't figure out where to see if the response to Azure and if it responds with some error. same for the live2d of course Any suggestions?

Here are the logs

Main Logs

muvtuber-live2ddriver-1       | 2023/05/26 18:29:54 WARN may be a OpenMouth after emo-motion, ignore: {"motion":"flick_head"}
muvtuber-muvtuberdriver-1     | 2023/05/26 18:29:54 INFO [allInOneSayer] say: done. text="Hello! A...u today?"
muvtuber-muvtuberdriver-1     | 2023/05/26 18:29:54 INFO [audioController] sendPlayCmd to audioview cmd=playVocal track=d41d8cd...
muvtuber-muvtuberdriver-1     | 2023/05/26 18:30:24 INFO [allInOneSayer] say: done. text="hi! how are you?"
muvtuber-muvtuberdriver-1     | 2023/05/26 18:30:24 INFO [PrioritizedChatbot] Chat(il_nikk): "hi! how are you?"
muvtuber-muvtuberdriver-1     | 2023/05/26 18:30:24 INFO [chatbot] SessionClient Chat: got textIn: chatbotName=ChatGPTChatbot textin="hi! ...you?"
muvtuber-chatgpt_chatbot-1    | INFO:root:ChatGPTgRPCServer.Chat: (OK) Hello! As an AI language model, I don't have emotions, but I'm functioning well. How can I assist you today?
muvtuber-muvtuberdriver-1     | 2023/05/26 18:30:28 INFO [chatbot] SessionClient Chat success. chatbot=ChatGPTChatbot sessionID=66b4b5a... textin="hi! ...you?" textout=Hell...day?
muvtuber-muvtuberdriver-1     | 2023/05/26 18:30:28 INFO [PrioritizedChatbot] Chat(il_nikk): "hi! how are you?" => (ChatGPTChatbot): "Hello! As an AI language model, I don't have emotions, but I'm functioning well. How can I assist you today?"
muvtuber-muvtuberdriver-1     | 2023/05/26 18:30:29 INFO [PriorityReduceFilter] outputMaxPriorityOne boost Priority -> Highest author=ChatGPTChatbot content="Hello! ... today?" priority=2
muvtuber-muvtuberdriver-1     | 2023/05/26 18:30:29 INFO [textOut] author=ChatGPTChatbot priority=2 content="Hello! As an AI language model, I don't have emotions, but I'm functioning well. How can I assist you today?"
muvtuber-muvtuberdriver-1     | 2023/05/26 18:30:29 INFO [audioController] sendPlayCmd to audioview cmd=playVocal track=d41d8cd...
muvtuber-muvtuberdriver-1     | 2023/05/26 18:30:59 INFO [allInOneSayer] say: done. text="Hello! A...u today?"

muvtubedriver

2023/05/26 18:37:24 INFO [PrioritizedChatbot] Chat(il_nikk): "hi! how are you?"
2023/05/26 18:37:24 INFO [chatbot] SessionClient Chat: got textIn: chatbotName=ChatGPTChatbot textin="hi! ...you?"
2023/05/26 18:37:28 INFO [chatbot] SessionClient Chat success. chatbot=ChatGPTChatbot sessionID=66b4b5a... textin="hi! ...you?" textout=Hell...day?
2023/05/26 18:37:28 INFO [PrioritizedChatbot] Chat(il_nikk): "hi! how are you?" => (ChatGPTChatbot): "Hello! As an AI language model, I don't have emotions, but I'm functioning well. How can I assist you today?"
2023/05/26 18:37:29 INFO [PriorityReduceFilter] outputMaxPriorityOne boost Priority -> Highest author=ChatGPTChatbot content="Hello! ... today?" priority=2
2023/05/26 18:37:29 INFO [textOut] author=ChatGPTChatbot priority=2 content="Hello! As an AI language model, I don't have emotions, but I'm functioning well. How can I assist you today?"
2023/05/26 18:37:29 INFO [audioController] sendPlayCmd to audioview cmd=playVocal track=d41d8cd...
2023/05/26 18:37:59 INFO [allInOneSayer] say: done. text="Hello! A...u today?"

externalsayer

2023/05/26 18:27:48 INFO gRPC API server started. addr=localhost:50010 sayer=*azuresayer.AzureSayer pid=1

musharing_chatbot

2023-05-26 18:27:48 INFO [root]: gRPC reflection enabled.

audioview

/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: IPv6 listen already enabled
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
172.18.0.1 - - [26/May/2023:18:29:35 +0000] "GET /?controller=ws://127.0.0.1:51081 HTTP/1.1" 200 467 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 OBS/29.1.1 Safari/537.36" "-"
172.18.0.1 - - [26/May/2023:18:29:35 +0000] "GET /assets/index-80b45ff1.js HTTP/1.1" 200 93145 "http://127.0.0.1:51082/?controller=ws://127.0.0.1:51081" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 OBS/29.1.1 Safari/537.36" "-"
172.18.0.1 - - [26/May/2023:18:29:35 +0000] "GET /assets/index-1583fd6e.css HTTP/1.1" 200 1133 "http://127.0.0.1:51082/?controller=ws://127.0.0.1:51081" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 OBS/29.1.1 Safari/537.36" "-"
cdfmlr commented 1 year ago

The log shows it working properly. No error.

Oh, I got it. The log shows that TTS was working properly. If it failed to call Azure, there should be a WARN log:

    ch, err := s.sayer.Say(ctx, text)
    if err != nil {
        slog.Warn("[allInOneSayer] say failed", "err", err, "text", ellipsis.Centering(text, 20))
        return err
    }

https://github.com/cdfmlr/muvtuberdriver/blob/41390d6aca94ff78f041d2052bf274cfd0286df7/say.go#L62

The problem may be caused by the audioview, which performs the playback.

            switch r {
            case AudioPlayStatusStart:
                started = true
            case AudioPlayStatusEnd:
                slog.Info("[allInOneSayer] AudioPlayStatusEnd", "text", ellipsis.Centering(text, 20))
                s.lostConsistency.Store(0)
                return nil
            case AudioPlayStatusErr:
                s.lostConsistency.Add(1)
                return errors.New("AudioPlayStatusErr")
            }

Here, https://github.com/cdfmlr/muvtuberdriver/blob/41390d6aca94ff78f041d2052bf274cfd0286df7/say.go#L78, I forgot logging the AudioPlayStatusErr case. I guess this branch is hit. Would you mind to add a log statement here to check it out?

ilNikk commented 1 year ago

i do this

            switch r {
            case AudioPlayStatusStart:
                started = true
            case AudioPlayStatusEnd:
                slog.Info("[allInOneSayer] AudioPlayStatusEnd", "text", ellipsis.Centering(text, 20))
                s.lostConsistency.Store(0)
                return nil
            case AudioPlayStatusErr:
                slog.Warn("[allInOneSayer] say failed", "err", err, "text", ellipsis.Centering(text, 20)) //log added issue #44
                s.lostConsistency.Add(1)
                return errors.New("AudioPlayStatusErr")
            }

logs return

 2023/05/27 10:01:10 WARN [allInOneSayer] say failed err=<nil> text="some text"
 2023/05/27 10:01:10 INFO [allInOneSayer] say: done. text="some text"
cdfmlr commented 1 year ago

what-wtf

That's awkward. Allow me some time to figure it out.

ilNikk commented 1 year ago

ah no wait maybe I was wrong. I'm not very familiar with GO, I'm still studying all your code. What will be the variable that returns the error? err or ctx? because if it is ctx the log I get back is this

ERROR [allInOneSayer] AudioPlayStatusErr err="context.Background.WithDeadline(2023-05-27 11:37:39.600043672 +0000 UTC m=+315.009407475 [4m29.998311567s])"

code

            case AudioPlayStatusErr:            
                s.lostConsistency.Add(1)
                slog.Error("[allInOneSayer] AudioPlayStatusErr", "err", ctx) //Logs added - issue #44
                return errors.New("AudioPlayStatusErr")
            }
cdfmlr commented 1 year ago

err is the error, while ctx is something to propagate cancellation information across a tree of goroutines.

cdfmlr commented 1 year ago

I reproduced your logs result without running an AudioView front end.

Have you opened an AudioView (in browser or OBS) for playback?

Running instances of Live2DView & AudioView are both required to make it work properly (one for animation and the other for audio playback).

ilNikk commented 1 year ago

Still no audio :(

Live2DView: http://localhost:51070/#/?driver=ws://localhost:51071/live2d

after long time of debuging I realize that I forgot to put /live2d on Live2DView. im so stupid

Have you opened an AudioView (in browser or OBS) for playback?

I try it on many different browser (Edge, Chrome, Firefox) and OBS obviously.

Running instances of Live2DView & AudioView are both required to make it work properly (one for animation and the other for audio playback).

All containter always running, and Blivechat setted on active BiliBili room ID, in this case 1163043

Sometimes the avatar says something nosense in Japanese 4/5 times in a row. Translating with google translete says "I teach you the language of flowers"

The log that i recive back now is this

muvtubedrvier all fine i think

2023/05/29 10:33:41 INFO [allInOneSayer] say: done. text="you are a liar"
2023/05/29 10:33:41 INFO [textOut] author=MusharingChatbot priority=2 content="what do you get when you cross a serious thief and a mad young man?"
2023/05/29 10:33:41 INFO [audioController] sendPlayCmd to audioview cmd=playVocal track=d41d8cd...

externalsayer 2023/05/29 10:31:00 INFO gRPC API server started. addr=localhost:50010 sayer=*azuresayer.AzureSayer pid=1

live2ddriver, some times i get broken pipe

2023/05/29 10:33:11 INFO fwd msg: {"motion":"idle"} -> http://localhost:51070 (chan 0xc000130720).
2023/05/29 10:33:11 INFO fwd msg: {"motion":"idle"} -> http://localhost:51070 (chan 0xc000480420).
2023/05/29 10:33:11 INFO fwd msg: {"motion":"idle"} -> http://localhost:51070 (chan 0xc000082480).
2023/05/29 10:33:11 WARN may be a OpenMouth after emo-motion, ignore: {"motion":"flick_head"}
2023/05/29 10:33:11 WARN may be a OpenMouth after emo-motion, ignore: {"motion":"flick_head"}
2023/05/29 10:33:41 INFO fwd msg: {"motion":"idle"} -> http://localhost:51070 (chan 0xc000082480).
2023/05/29 10:33:41 INFO fwd msg: {"motion":"idle"} -> http://localhost:51070 (chan 0xc000480420).
2023/05/29 10:33:41 INFO fwd msg: {"motion":"idle"} -> http://localhost:51070 (chan 0xc000130720).
2023/05/29 10:33:41 WARN may be a OpenMouth after emo-motion, ignore: {"motion":"flick_head"}
2023/05/29 10:33:41 INFO fwd msg: {} -> http://localhost:51070 (chan 0xc000082480).
2023/05/29 10:33:41 INFO fwd msg: {} -> http://localhost:51070 (chan 0xc000480420).
2023/05/29 10:33:41 INFO fwd msg: {} -> http://localhost:51070 (chan 0xc000130720).
2023/05/29 10:33:41 ERROR fwd msg to http://localhost:51070 (chan 0xc000480420) error: write tcp 172.18.0.9:9001->172.18.0.1:42614: write: broken pipe.
2023/05/29 10:33:41 Stop ForwardMessageTo: http://localhost:51070 by chan 0xc000480420.

live2dview

2023/05/29 10:33:55 [notice] 1#1: worker process 40 exited with code 0
2023/05/29 10:33:55 [notice] 1#1: signal 29 (SIGIO) received
2023/05/29 10:33:55 [notice] 1#1: signal 17 (SIGCHLD) received from 45

audiovidew

2023/05/29 10:33:55 [notice] 1#1: worker process 40 exited with code 0
2023/05/29 10:33:55 [notice] 1#1: signal 29 (SIGIO) received
2023/05/29 10:33:55 [notice] 1#1: signal 17 (SIGCHLD) received from 30

Now in my opinion maybe there is some problem with azure connection (i have more of 200€ credit on my Azure account so its not that problem).

~~There is my configuration file of external sayer (\configs\externalsayer\config.yaml). I put the key 1 from azure is fine right? I also try to put a fake key just for test if some error show on log but nothing error returned. Maybe the configuration file is not loaded correctly?~~

SrvAddr: "localhost:50010"
EnabledSayer: "azure"
AzureSayer:
  SpeechKey: "azure key xxxxx"
  SpeechRegion: "westeurope"
  FormatMicrosoft: "audio-16khz-32kbitrate-mono-mp3"
  FormatMimeSubtype: "mp3"
  Roles:
    "jenny": '<speak version="1.0" xml:lang="zh-CN"><voice name="en-US-JennyMultilingualNeural">{{.}}</voice></speak>'

You can find a full log of container here: https://pastebin.com/xYtEhc1G

UPDATE: I use a dummy code to test my azure accout and it work. So the problem is actually that it's not making the request to Azure from the Sayer. Hitches if I go to the Azure metrics panel, there are no requests or errors, nothing

cdfmlr commented 1 year ago

Attention:

You can find a full log of container here: https://pastebin.com/xYtEhc1G

You have exposed your OpenAI key. Please regenerate your key to avoid losses if you are not intending to public it.


Thanks for your patient and detailed information.

I must ask for your forgiving first for my misdirection. I ignored an important fact that is: the sayer logs successes. It INFOs every successful request. And your logs shows no such infos. So the problem is actually about the sayer.

My sayer logs like following:

muvtuber-externalsayer-1  | 2023/05/28 08:39:55 INFO gRPC API server started. addr=0.0.0.0:50010 sayer=*azuresayer.AzureSayer pid=1
muvtuber-externalsayer-1  | 2023/05/28 08:40:13 INFO sayerServiceServer Say succeeded. text=[哇]
muvtuber-externalsayer-1  | 2023/05/28 08:40:23 INFO sayerServiceServer Say succeeded. text=哇!谢谢你的...更加有趣呢!

Here is a way to test the sayer service:

grpcurl -d '{"role": "xiao", "text": "test"}' -plaintext localhost:51065  muvtuber.sayer.v1.SayerService.Say

Requesting the RPC service through a tool grpcurl. If sayer works, it prints:

// success:
{
  "format": "mp3",
  "audio": "base64 encoded binary audio content"
}

// or failed:
 muvtuber.sayer.v1.SayerService.Say
ERROR:
  Code: Internal
  Message: unknown role: notexist-role

Meanwhile, the sayer logs:

muvtuber-externalsayer-1  | 2023/05/30 01:39:05 INFO sayerServiceServer Say succeeded. text=test
muvtuber-externalsayer-1  | 2023/05/30 01:42:30 WARN sayerServiceServer Say failed. err="rpc error: code = Internal desc = unknown role: notexist-role" text=test

Please try it and see how it works.

And here, for your problem, on my view, I guess there may be something wrong about the "role". Different SpeechRegion serves different available roles. Here are some documents help config it:


Following are about your other questions:

All containter always running, and Blivechat setted on active BiliBili room ID, in this case 1163043

If you are not tend to stream on bilibili, it's not required to set it. To disable the blivechat, you can set the room id to 0. Then the main() will ignore it.

Sometimes the avatar says something nosense in Japanese 4/5 times in a row. Translating with google translete says "I teach you the language of flowers"

I guess you mean the voices of the live2d model. The example live2d model (her name is Shizuku) contains voices that will be played with some motions. Those sounds is nonsense for us. If you are using OBS, there is an option to 「Control audio through OBS」 for the browser input. Toggle it on, and you can mute it.

I have also mentioned this in readme:

Virtual Image (Live2DView): Source > + > Browser > New > URL: http://localhost:9000/#/ Pay attention to tick "Control audio through OBS", and then turn off the sound, otherwise you will have a chance to hear some cute Japanese.

live2ddriver, some times i get broken pipe

It's ok to see this broken pipe if you have closed/reloaded the live2dview front end in the browser. broken pipe means the driver stoped sending msg to the closed front end.


I am not sure if I am expressing myself clearly. I just failed on an English exam and lost all my confidence 😭. Let me know if it's hard to read. I will manage to explain it.

ilNikk commented 1 year ago

Thanks for your help!

grpcurl return me this

Error invoking method "muvtuber.sayer.v1.SayerService.Say": error getting request data: invalid character 'r' looking for beginning of object key string

the correct grpcurl is this, need \ before "

grpcurl  -d '{\"role\": \"jenny\", \"text\": \"test\"}' -plaintext localhost:51065  muvtuber.sayer.v1.SayerService.Say

the command work..

v1.SayerService.Say
{
  "format": "mp3",
  "audio": "some code"
}

external sayer log

2023/05/30 10:04:24 INFO gRPC API server started. addr=0.0.0.0:50010 sayer=*azuresayer.AzureSayer pid=1
2023/05/30 10:04:26 INFO sayerServiceServer Say succeeded. text=test

now in my opinion there is some problem with my configuration files 🤔 maybe the ports

/config/muvtuberdriver/config.yaml

blivedm:
    server: ws://blivechat:12450/api/chat
    roomid: 0
textouthttp:
    server: ""
    droprate: 0
live2d:
    driver: http://live2ddriver:9004/driver
    forwarder: http://live2ddriver:9002/live2d
chatbot:
    musharing:
        server: musharing_chatbot:50051
        disabled: false
    chatgpt:
        server: chatgpt_chatbot:50052
        configs:
            - version: 3
              apikey: sk-blabla
              initialprompt: You are muli, an AI VTuber live streaming.
        cooldown: 15
        disabled: false
sayer:
    server: 0.0.0.0:51065
    role: jenny
listen:
    textinhttp: 0.0.0.0:51080
    audiocontrollerws: 0.0.0.0:51081
readdm: true
reduceduration: 5
toolong:
    maxwords: 500
    quibbles:
        - 太长了,不想说。
        - 禁則事項です。
        - 爬。

/config/externalsayer/config.yaml

SrvAddr: "0.0.0.0:50010"
EnabledSayer: "azure"
AzureSayer:
  SpeechKey: "blabla"
  SpeechRegion: "westeurope"
  FormatMicrosoft: "audio-16khz-32kbitrate-mono-mp3"
  FormatMimeSubtype: "mp3"
  Roles:
    #"Pierina": '<speak version="1.0" xml:lang="it-IT"><voice name="it-IT-PierinaNeural">{{.}}</voice></speak>'
    "jenny": '<speak version="1.0" xml:lang="zh-CN"><voice name="en-US-JennyMultilingualNeural">{{.}}</voice></speak>'

Docker compose

version: '3'
services:
  blivechat:
    image: cdfmlr/muvtuber-blivechat:v1.6.1-muvtb.4
    build: ./blivechat/
    ports:
      - "51060:12450"
    dns:
      - 1.1.1.1
      - 8.8.8.8
    restart: unless-stopped
  emotext:
    image: cdfmlr/muvtuber-emotext:v0.0.1
    build: ./emotext/
    ports:
      - "51061:9003"
    restart: unless-stopped
  chatgpt_chatbot:
    image: cdfmlr/muvtuber-chatgpt_chatbot:v0.0.2
    build: ./chatgpt_chatbot/
    ports:
      - "51052:50052"
    dns:
      - 1.1.1.1
      - 8.8.8.8
   # environment:
      # 本地代理的地址:需要根据每个人的情况具体设置
      # host.docker.internal 是 docker desktop 默认带有的访问宿主机的域名,
      # 但不一定有效,例如我的 docker vm 访问宿主机的 ip 需要用 192.168.5.2
     # - HTTP_PROXY=http://host.docker.internal:10809
     # - HTTPS_PROXY=http://host.docker.internal:10809
    restart: unless-stopped
  musharing_chatbot:
    image: cdfmlr/muvtuber-musharing_chatbot:v0.0.2
    build: ./musharing_chatbot/
    ports:
      - "51051:50051"
    restart: unless-stopped
  live2ddriver:
    image: cdfmlr/muvtuber-live2ddriver:v0.0.3
    build: ./live2ddriver/
    ports:
      - "51071:9001"
      - "51072:9002"
      - "51074:9004"
    environment:
      - EMOTEXT_SERVER=http://emotext:9003
    depends_on:
      - emotext
    restart: unless-stopped
  live2dview:
    image: cdfmlr/muvtuber-live2dview:v0.0.3
    build: ./live2dview/
    ports:
      - "51070:80"
    restart: unless-stopped
  externalsayer:
    image: cdfmlr/muvtuber-externalsayer:v0.0.2
    build: ./externalsayer/
    ports:
      - "51065:50010"
    volumes:
      - ./configs/externalsayer:/app/config
    restart: unless-stopped
  audioview:
    image: cdfmlr/muvtuber-audioview:v0.0.1
    build: ./audioview/
    ports:
      - "51082:80"
    restart: unless-stopped
  muvtuberdriver:
    image: cdfmlr/muvtuber-muvtuberdriver:v0.0.8
    build: ./muvtuberdriver/
    ports:
      - "51080:51080"
      - "51081:51081"
    volumes:
      - ./configs/muvtuberdriver:/app/config
    depends_on:
      - blivechat
      - live2ddriver
      - musharing_chatbot
      - chatgpt_chatbot
      - externalsayer
    restart: unless-stopped
cdfmlr commented 1 year ago
sayer:
    server: 0.0.0.0:51065

You cannot dial the IP 0.0.0.0. It's a shorthand for listening all network interfaces. Try to change it into localhost or 127.0.0.1.

If you are using docker compose, it should be left as the default server: externalsayer:50010. And do not need to change it.

ilNikk commented 1 year ago

If you are using docker compose, it should be left as the default server: externalsayer:50010. And do not need to change it.

Ok. Finllay Works. I made a stupid mistake 🥲 Now I can focus on Twitch integration

Thanks so much bro

cdfmlr commented 1 year ago

Review: #44, #47 & #49 are the same issue. It is finally fixed by 269b6b9a2c910fbf49978a6e8223ab48eb8eb3e7 (v0.3.6).