Supergiovane / node-red-contrib-tts-ultimate

This node transforms a text into a speech audio. You can hear the voice natively through Sonos or external players.
MIT License
24 stars 5 forks source link

feature request: file naming includes voiceId to differentiate cached files #86

Closed bs-eng closed 4 months ago

bs-eng commented 4 months ago

Hi Giovane!

I believe this is rather a feature request, then a bug.

State of things now After testing sending many texts to the tts-node I noticed, that the output file name seems to depend only on the text which was converted to speech. That means if I send a text with e.g. a female voice and then a male voice, the file in directory .node-red/sonospollyttsstorage/ttsfiles gets overwritten. Combined with the fact, that the tts-node outputs the requests as an array (if input number is larger and fast) there is no chance to grab the file before it gets overwritten with the other voice.

To Reproduce Here is a sample flow to reproduce the behaviour. It inputs 4 text lines and attempts to send those 4 text lines to the tts-node with female and male voiceId. Thus expected are 8 files in .node-red/sonospollyttsstorage/ttsfiles . However there are only 4 files and after checking them, they are all male (which was the second voice that got used). So the female voice files got overwritten by the male voice files.

[
    {
        "id": "7b5c7adf0a675b5a",
        "type": "ttsultimate",
        "z": "138c6cb20fe4c960",
        "name": "DK",
        "voice": "da-DK-Wavenet-A#da-DK#FEMALE",
        "ssml": true,
        "sonosipaddress": "192.168.1.109",
        "sonosvolume": "30",
        "sonoshailing": "0",
        "config": "557d8082.eb5a8",
        "property": "payload",
        "propertyType": {},
        "rules": [],
        "playertype": "noplayer",
        "speakingrate": "1",
        "speakingpitch": "0",
        "unmuteIfMuted": false,
        "elevenlabsStability": "",
        "elevenlabsSimilarity_boost": "",
        "x": 530,
        "y": 1040,
        "wires": [
            [
                "54c89a246777d32e"
            ],
            []
        ]
    },
    {
        "id": "2519d346690ada0a",
        "type": "inject",
        "z": "138c6cb20fe4c960",
        "name": "",
        "props": [
            {
                "p": "payload"
            }
        ],
        "repeat": "",
        "crontab": "",
        "once": false,
        "onceDelay": 0.1,
        "topic": "",
        "payload": "[{\"AudioID\":1,\"FileID\":\"001\",\"lang\":\"DK\",\"filename\":\"DK001.mp3\",\"text\":\"Enhver har ret til undervisning. \"},{\"AudioID\":2,\"FileID\":\"002\",\"lang\":\"DK\",\"filename\":\"DK002.mp3\",\"text\":\"Undervisningen skal være gratis, i det mindste på de elementære og grundlæggende trin.\"},{\"AudioID\":3,\"FileID\":\"003\",\"lang\":\"DK\",\"filename\":\"DK003.mp3\",\"text\":\"Elementær undervisning skal være obligatorisk.\"},{\"AudioID\":4,\"FileID\":\"004\",\"lang\":\"DK\",\"filename\":\"DK004.mp3\",\"text\":\"Teknisk og faglig uddannelse skal gøres almindelig tilgængelig for alle, og på grundlag af evner skal der være lige adgang for alle til højere undervisning.\"}]",
        "payloadType": "json",
        "x": 230,
        "y": 1040,
        "wires": [
            [
                "742bfe851fdf3885"
            ]
        ]
    },
    {
        "id": "742bfe851fdf3885",
        "type": "function",
        "z": "138c6cb20fe4c960",
        "name": "save & split",
        "func": "// the function makes a clone of the msg.payload (which contains the audiofile array)\n// doubles all files for female and male version if available for that language\n// assembles the file name including gender and language tag\n// sends out the message to tts with text and voiceID\n\nlet ttsF = msg.payload;\nlet ttsStack = [];\nlet voices = flow.get(\"voiceIDs\");\n\n// for now we assume that all languages support female and male\nlet count = ttsF.length;\nlet i = 0;\n// female voice\nfor ( ; i < count; i++) {\n    const element = ttsF[i];\n    // assemble stack item\n    ttsStack.push({\n        \"audio\":element,\n        \"parts\":{\n            \"id\":   msg._msgid,\n            \"type\": \"array\",\n            \"count\": count,\n            \"len\":  1,\n            \"index\": i\n        },\n        \"filename\":\"F\" + element.lang + element.FileID + \".mp3\",\n    });\n    // send msg to tts\n    node.send({\"payload\":element.text,\"voiceId\":voices[element.lang].F});\n}\n\n// male voice\nfor ( ; i < count*2; i++) {\n    const element = ttsF[i-count];\n    ttsStack.push({\n        \"audio\":element,\n        \"parts\":{\n            \"id\":   msg._msgid,\n            \"type\": \"array\",\n            \"count\": count,\n            \"len\":  1,\n            \"index\": i\n        },\n        \"filename\":\"M\" + element.lang + element.FileID + \".mp3\"\n    });\n    // send msg to tts\n    node.send({\"payload\":element.text,\"voiceId\":voices[element.lang].M});\n}\n\n// save source stack to flow\nflow.set(\"ttsStack\", ttsStack);\n\nreturn null;",
        "outputs": 1,
        "timeout": 0,
        "noerr": 0,
        "initialize": "",
        "finalize": "",
        "libs": [],
        "x": 370,
        "y": 1040,
        "wires": [
            [
                "7b5c7adf0a675b5a"
            ]
        ]
    },
    {
        "id": "54c89a246777d32e",
        "type": "function",
        "z": "138c6cb20fe4c960",
        "name": "merge from context",
        "func": "// the routine assumes exact match between input array to tts-node and output array of tts-node\n\nif (msg.payload === true) {\n    // Get the array from the flow context memory\n    let ttsStack = flow.get(\"ttsStack\");\n    let count = ttsStack.length;\n    for (let i = 0; i < count; i++) {\n        let element = ttsStack[i];\n        let ttsresult = msg.filesArray[i];\n        \n        // set file dst info\n        element.newPath     = \"/home/nodered/audio/\" + element.audio.lang.substring(0, 2) + \"/\";\n        element.newFilename = element.filename;\n\n        // set file src info\n        // index at which to split path from filename\n        var index = ttsresult.file.lastIndexOf(\"/\")+1;\n        element.oldPath     = ttsresult.file.substring(0, index);\n        element.oldFilename = ttsresult.file.substring(index);\n\n        // send audio element\n        node.send(element);\n    }\n    // all elements are done, clear the ttsStack\n    //flow.set(\"ttsStack\", null);\n}\n\nreturn null;\n\n",
        "outputs": 1,
        "timeout": 0,
        "noerr": 0,
        "initialize": "",
        "finalize": "",
        "libs": [],
        "x": 710,
        "y": 1020,
        "wires": [
            [
                "98f424a0e1aabc8c"
            ]
        ]
    },
    {
        "id": "98f424a0e1aabc8c",
        "type": "fs-ops-copy",
        "z": "138c6cb20fe4c960",
        "name": "",
        "sourcePath": "oldPath",
        "sourcePathType": "msg",
        "sourceFilename": "oldFilename",
        "sourceFilenameType": "msg",
        "destPath": "newPath",
        "destPathType": "msg",
        "destFilename": "newFilename",
        "destFilenameType": "msg",
        "link": false,
        "overwrite": true,
        "x": 880,
        "y": 1020,
        "wires": [
            [
                "2738c0c743bf2d26"
            ]
        ]
    },
    {
        "id": "2738c0c743bf2d26",
        "type": "join",
        "z": "138c6cb20fe4c960",
        "name": "",
        "mode": "auto",
        "build": "object",
        "property": "payload",
        "propertyType": "msg",
        "key": "topic",
        "joiner": "\\n",
        "joinerType": "str",
        "accumulate": "false",
        "timeout": "",
        "count": "",
        "reduceRight": false,
        "x": 1030,
        "y": 1020,
        "wires": [
            []
        ]
    },
    {
        "id": "3c1cac23b6478b3c",
        "type": "function",
        "z": "138c6cb20fe4c960",
        "name": "set voice IDs",
        "func": "\nreturn null;",
        "outputs": 1,
        "timeout": 0,
        "noerr": 0,
        "initialize": "// Code added here will be run once\n// whenever the node is started.\nvar voiceIDs = { \"DK\": { \"F\": \"da-DK-Standard-A#da-DK#FEMALE\", \"M\": \"da-DK-Standard-C#da-DK#MALE\" } };\nflow.set(\"voiceIDs\", voiceIDs);",
        "finalize": "",
        "libs": [],
        "x": 370,
        "y": 980,
        "wires": [
            []
        ]
    },
    {
        "id": "557d8082.eb5a8",
        "type": "ttsultimate-config",
        "name": "google cloud TTS",
        "noderedipaddress": "127.0.0.1",
        "noderedport": "1980",
        "purgediratrestart": "leave",
        "ttsservice": "googletts",
        "TTSRootFolderPath": ""
    }
]

Expected behavior If the voice string could be somehow included in order to differentiate those files by voice, that would be great. Looking at the speed that google is adding more and more voices for different use cases, but same language, I believe this feature will be very helpful also to others.

TTS-Ultimate Version

Are you running node-red behind homematic, docker or anything similar? Node-Red is running alone on small linux server

Thanks a lot for looking into this! Cheers JR

Supergiovane commented 4 months ago

Hi for other TTS engines, i've takein it into consideration. For googletts engine, i don't remember what i've done and why, i'll take a look ASAP.

Supergiovane commented 4 months ago

Hi The node isn't mean to be used as a batch file generator. You're changing the voiceID too quickly. The voiceID is a node variable. If you change the voiceID while the queuing system is still reading the queue, you're getting weird behaviour. You must send a msg to the TTS node, wait until the node have wrote the file, then send the next msg. In your flow, you must send the first 4 messages, then wait until the TTS node finished handling it (wait for the output msg.payload TRUE), then send the next 4 messages having a different voiceID.

bs-eng commented 4 months ago

Hi! Ok, I understand the batch part. I can change that in my flow. How about using the offline files later? If the voice changed in between, then the files always get overwritten? I am aksing because it means I cannot rely on the offline files, as I will never know which voiceId created them.

Supergiovane commented 4 months ago

Hi the files remain in the folder forever. The node's goal is to never ask the cloud again after the file has been created. The filenames are created based on a combination of text to be spoken and voiceID, then hashed with MD5

bs-eng commented 4 months ago

Hi Giovane! Thanks a lot for your qucik replies! This morning I ran another test and can confirm, that voice is encoded into the file name. Condition is that the node has time to finish handling audios before getting a new voice setting. Cheers JR