qryxip commented 8 months ago

内容

737 の下準備として、VVMがトーク用モデルセットを必ずしも持たなくてよいようにします。

manifest.jsonは次のようにします。

-  "decode_filename": "decode.onnx",
-  "predict_duration_filename": "predict_duration.onnx",
-  "predict_intonation_filename": "predict_intonation.onnx",
+  "talk_model_filenames": {
+    "predict_duration": "predict_duration.onnx",
+    "predict_intonation": "predict_intonation.onnx",
+    "decode": "decode.onnx"
+  },

(edit) その他にも、実装の各所に"talk"という名前を導入します。

581 (?)

その他

qryxip commented 8 months ago

PRのタイトルを変更し、最初のコメントにも追記を入れました。 #737 の準備ということで、実装上でトーク用モデルを"talk"いう名前を与えるようにしました。

qryxip commented 8 months ago

~~InferenceDomainSet~~ InferenceDomainGroupという概念を導入し、一つのStatusが複数のInferenceDomainを扱うようにしました。

qryxip commented 8 months ago

metas.jsonにtype: "talk"だけ入れました。

manifest.jsonはこうしました。

-  "decode_filename": "decode.onnx",
-  "predict_duration_filename": "predict_duration.onnx",
-  "predict_intonation_filename": "predict_intonation.onnx",
-  "style_id_to_model_inner_id": {
-    "302": 2,
-    "303": 3
+  "talk": {
+    "predict_duration_filename": "predict_duration.onnx",
+    "predict_intonation_filename": "predict_intonation.onnx",
+    "decode_filename": "decode.onnx",
+    "style_id_to_model_inner_id": {
+      "302": 2,
+      "303": 3
+    }
   }

エラーとしてMissingModelDataを追加しました。あるVVMにtype: "talk"のスタイルが含まれているのにも関わらずトーク用のonnx(とmodel_inner_idsの組)が無い場合にこのエラーになるはずです。

qryxip commented 8 months ago

typeにだけ"singing_teacher" | "frame_decode" | "sing"を追加し、それに伴いPRタイトルを変更しました。 5df5a07 (#761)

qryxip commented 8 months ago

Synthesizer: 1:1 (InferenceDomainGroupImpl) Status: 1:1 (多相) VoiceModel: 1:1 (InferenceDomainGroupImpl) InferenceRuntime: 無関係

です。Synthesizer → Statusも1:1のままです。

Hiroshiba commented 8 months ago

@qryxip なるほどです！！

予想外だったのは synthesizer : statusが1:1な点です。これはInferenceDomainGroupImplごとにSynthesizerを作るみたいな感じというニュアンスで合ってそうでしょうか 👀 （あ、そうしてほしい、という意図ではなく、確認です！）

認識ずれてたら「多相」がイメージできてないかもです。。 Statusの下（中）にtalkオブジェクトやsingオブジェクトがあるわけではない、というのはなんとなくわかるのですが･･･

qryxip commented 8 months ago

これはInferenceDomainGroupImplごとにSynthesizerを作るみたいな感じというニュアンスで合ってそうでしょうか :eyes:

はい。そうです。というよりInferenceDomainGroupImplはVOICEVOXが扱うモデル全部を指す予定です。というのもInferenceDoamin::MapでInferenceDoamin²すら表現できるためです。

図示するとこのようになります。 (InferenceDoamin以下は現状の設計のままです。このPRではInferenceDoaminGroupにより複数のInferenceDoaminを一つにまとめています)

InferenceDoaminGroupImpl: InferenceDoaminGroup
├── TalkDomain: InferenceDoamin
│   └── TalkOperation: InferenceOperation
│       ├── TalkOperation::PredictDuration
│       │   ├── PredictDurationInput: InferenceInputSignature
│       │   └── PredictDurationOutput: InferenceOutputSignature
│       ├── TalkOperation::PredictIntonation
│       │   ├── PredictIntonationInput: InferenceInputSignature
│       │   └── PredictIntonationOutput: InferenceOutputSignature
│       └── TalkOperation::Decode
│           ├── DecodeInput: InferenceInputSignature
│           └── DecodeOutput: InferenceOutputSignature
├── SingingTeacherDomain: InferenceDoamin
│   └── SingingTeacherOperation: InferenceOperation
│       ├── SingingTeacherOperation::PredictSingConsonantLength
│       │   ├── PredictSingConsonantLengthInput: InferenceInputSignature
│       │   └── PredictSingConsonantLengthOutput: InferenceOutputSignature
│       ├── SingingTeacherOperation::PredictSingF0
│       │   ├── PredictSingF0Input: InferenceInputSignature
│       │   └── PredictSingF0Output: InferenceOutputSignature
│       └── SingingTeacherOperation::PredictSingVolume
│           ├── PredictSingVolumeInput: InferenceInputSignature
│           └── PredictSingVolumeOutput: InferenceOutputSignature
└── FrameDecodeDomain: InferenceDoamin
    └── FrameDecodeOperation: InferenceOperation
        └── FrameDecodeOperation::SfDecode
            ├── SfDecodeInput: InferenceInputSignature
            └── SfDecodeOutput: InferenceOutputSignature

認識ずれてたら「多相」がイメージできてないかもです。。 Statusの下（中）にtalkオブジェクトやsingオブジェクトがあるわけではない、というのはなんとなくわかるのですが･･･

Statusは型引数としてG: InferenceDomainGroupを扱い、talk | singing_teacher | frame_decodeの区分を直接見ることがないです。そのためにmap.any(p), map.try_ref_map(f), D::visit(map) (ただしmap: InferenceDomainMap, D: InferenceDomain)があります。

(本当はD::visit(map)ではなくmap.get::<D>()と書けるようにしたかったのですが、https://github.com/rust-lang/rust/issues/20041が来ない限り無理という結論に達しました)

qryxip commented 8 months ago

22d53f9 (#761): "Association"を"MapValueProjection"としました。
2a2f273 (#761): model_bytes_or_sessions: (Option<_>, Option<_>, Option<_>)のようになっていた箇所をexistences: (bool, bool, bool)に変換するようにしました。
2bb8b82 (#761): ちょっとした変更
5eeb975 (#761): コメントをdocstringとして書きました。

qryxip commented 7 months ago

2週間経ちましたが、InferenceDomainMapValueProjection (旧InferenceDomainAssociation)はやっぱ無しでいいような気がしてきました。status.rsはinfer/から引っ張り出して、SessionSetだけinfer/下に残す感じで。

.any(…)とか.try_ref_map(…)とか見た目が綺麗とはとても言い難いですし、Domainの数を抽象化で封じ込める意義は多分そんなに無いし、status.rsは"inference"というよりはmetasも扱ってますし、なんかそっちの方が良いような気がしてきました。

Hiroshiba commented 7 months ago

ちょっと何がどう変わるが想像しきれてないのですが、少なくとも直近では特に問題はないのかなと思いました！

qryxip commented 7 months ago

InferenceDomainGroupやInferenceDomainMapValueProjection (旧InferenceDomainAssociation)は解体しました。InferenceDomainMapは具体的なstructとしての形で残してあります。

status.rs内にInferenceDomainExtというのが誕生してしまいましたが、これは致し方ないかなと思っています。

qryxip commented 6 months ago

レビュー対応しました。

VOICEVOX / voicevox_core

`StyleMeta::r#type`を追加し、トークという区分を実装に導入する #761

内容

737 の下準備として、VVMがトーク用モデルセットを必ずしも持たなくてよいようにします。

関連 Issue

581 (?)

その他