qryxip commented 10 months ago

内容

ONNX Runtimeとモデルのシグネチャ(?)の存在を、別のモジュールに分離します。

以下の改革の前準備です。

https://github.com/VOICEVOX/voicevox_core/pull/553#issuecomment-1656366892

545

その他

Hiroshiba commented 10 months ago

なかなか巨大になりそうですね…！！お待ちしてます！！

qryxip commented 10 months ago

大体考えて組んだので、draftを外します。

モジュールinferの登場人物は主に3つに分けられます。これらをもとに抽象化を組んでいます。 (ただinfer.rsは115行しかないといっても、中身の圧が強いのでファイル分割すべきかもしれません)

trait InferenceRuntime (e.g. Onnxruntime)
- trait Session
- trait RunBuilder
- trait InputScalar (e.g. i64, f32)
trait Signature
- trait Output (e.g. (Vec<f32>,))
- enum SignatureKind { PredictDuration, PredictIntonation, Decode })
  - struct PredictDuration: Signature<Kind = SignatureKind>
  - struct PredictIntonation: Signature<Kind = SignatureKind>
  - struct Decode: Signature<Kind = SignatureKind>
struct SessionSet<K (= SignatureKind), R: InferenceRuntime>
- struct SessionOptions

補足としては:

onnxruntime-rsを触る部分はまとめてcrate::infer::runtimes::onnxruntimeに隔離しました。
model_fileモジュール (復号部分), SessionOptions, SessionSet, InferenceModelsなどはinferモジュールに移しました。
- status::SessionSetとvoice_model::InferenceModelsはenum-mapクレートを用いてEnumMap<SignatureKind, …>のような形で表現しています。
<R: InferenceRuntime>の型引数はSynthesisEngineのあたりで止めています。またSignatureKind (= PredicutDuration | …)もstatusモジュールに直接登場させています。
将来の変更に対する耐性としては、次のように考えています。
- ONNX Runtime以外のランタイム(e.g. TFLite)を導入するときは、crates::infer::runtimesに新しいランタイムを増設します。
  - featureを導入し、コンパイル時にONNX Runtimeと切り替えられるようにします。
  - ただVVMの仕様も含め、パブリックAPI部分の変更はそもそもどのみち避けられないかと思います。
- モデルのシグネチャや数を変更するときは、crate::infer::signatureを変更します。
  - ([f32],)以外の出力にするときは、infer/runtimes/onnxruntime.rsのimpl Output<Onnxruntime> for (Vec<f32>,)の部分を拡張します。
  - predict部分のモデルとdecodeのモデルを別々に分けて読み込みたい、という需要が将来発生した場合、enum SignatureKindの存在を二つに分割してそれぞれについてEnumMapを管理します。
  - 3-モデルの組と4-モデルの組を同時に扱いたいとなったときもenum SignatureKindを複数作って抽象化します。

qryxip commented 10 months ago

主な変更点として:

trait InferenceRuntimeを継承する次の3つのトレイトを導入し、これらに役割を持たせました。
- trait SupportsInferenceInputTensor<I> (Iの例: Array1<i64>)
- trait SupportsInferenceInputSignature<I> (Iの例: PredictDurationInput)
- trait SupportsInferenceOutput<O> (Oの例: (Vec<f32>,))
trait SignatureをInferenceSignatureとInferenceInputSignatureに分離しました。またtrait InputScalarとtrait Outputは、上記のInferenceRuntimeのサブトレイトに吸収される形で消えました。 (https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1384224966)
RunBuilderは(基本的に)自分で関数を持たなくなったので、RunContextとしました。(https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1384216221)
RunContextを除きprefixにInferenceを付けるようにしました。(https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1384216221)

qryxip commented 10 months ago

https://github.com/VOICEVOX/voicevox_core/pull/675/commits/8b4f3b6f4f5be8a083eb22427c031ba53ae85cfd Signatureの"kind"は、"model kind"ということにしました。というのもこれをキーにしたmapでsessionを管理したりするので、"model"と呼ぶ方が実態と合っているかなと思ったので。 (それに伴いtrait InferenceModelGroupという概念を追加しました)

qryxip commented 10 months ago

https://github.com/VOICEVOX/voicevox_core/pull/675/commits/1b1b7bfc32d7b7d21cc98cbf219046fc257d75bd: statusモジュールをinferモジュール下に移しました。それに加え、InferenceGroupImplの存在をStatusから外し、Status<R: InferenceRuntime, G: InfereceGroup>として運用するようにしました。Synthesizerの四層構造について以前、二層まで圧縮できるのではないかと言った気がするのですが、Statusがこういう役割を果たせることがわかったので今は三層(Synthesizer → InferenceCore<R> → Status<R, G>)で十分かなと思っています。

(追記) decodeのworkaround処理をsignatures.rsに統合したら、InfereceCoreも消して二層にできそう。
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/c39f48cb5c560b6f4dde212767ba78beba941606: trait RunContextを削除しました (InferenceRuntime::RunContextはそのまま)

qryxip commented 10 months ago

https://github.com/VOICEVOX/voicevox_core/pull/675/commits/c316209419cc81f96d16739ac10aaba1c6a907d1: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1390230606

qryxip commented 10 months ago

https://github.com/VOICEVOX/voicevox_core/pull/675/commits/2274a34d1382a4c528228c5caa63c2f5cdad7b35: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1390316946

qryxip commented 10 months ago

https://github.com/VOICEVOX/voicevox_core/pull/675/commits/96a93e9f16c8b0b9e3ea01083bfe42790993dcd4: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1392739314
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/59d87797ca64e39375c48cb4413a6be3f8d1cbfa: voicevox_core_macrosのリファクタ(モジュール分割)をしました。 200行以上のコードのインデントのずれが生じるため、今やった方がよいかなと…
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/d0dc56ff96cec6652540016343aa9c47ffc1c819: InferenceDomain::{INPUT,OUTPUT}_PARAM_INFOSを統合し、タプルのEnumMapにしました。
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/c654cd1f2d10acbc3f68aa4e6694525d4566beb2: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1392743606
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/868d3f61e76cf04147674c86d60142c7fec0ca6b: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1392744183
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/099879375df892544a95825832197b51a3863ba0: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1392746394

qryxip commented 10 months ago

であれば、もう一旦マージして、階層構造を小さくすることに取り組むのが良いのかなと思いました。実装していく上でまた新しい課題が見つかったりもする可能性も全然あると思いますし。

構造の変更はもうこのプルリクエスト内ではやめて、名称の変更やドキュメントの追加だけにして、次に進むというのはどうでしょう･･･？（1回のレビューに3時間かかっており。。。。）

はい。それがよいと思います。

階層構造ですが、今InferenceCoreでやっている「入出力のworkaround処理」をdomain.rs(旧signatures.rs)に移せれば、こんな感じにできるのではと思っています。

synthesizer.rs --+--> infer/status.rs -----> infer/runtimes/
                 +--> infer/domain.rs
                 |
                 +--> open_jtalk.rs

Hiroshiba commented 10 months ago

こんな感じにできるのではと思っています。

なるほどです、良さそうに思いました！！

なんとなくの直感ですが、statusがかなり多くの責務を担いそうだなと思いました。statusという名称が曖昧なので便利屋さんみたいになりがちかも。将来なんか名前変えられると良さそう！あとSynthesizerが何を見れるべきかを考えていくとアーキテクチャ作りやすいのかなとか思いました。RuntimeやSessionは見れるべきではない、など？

qryxip commented 10 months ago

https://github.com/VOICEVOX/voicevox_core/pull/675/commits/75fd7acaa6a4d03f67552ca3d80e0253da88b8fd: https://github.com/VOICEVOX/voicevox_core/pull/675/files/81b5804037ea21fe0b3fe0ae934fe1d2a2548c74#r1392976600

qryxip commented 10 months ago

https://github.com/VOICEVOX/voicevox_core/pull/675/commits/ad222c98cd1e47a4dbb920d742bf2844bf3cd442: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1394423004

qryxip commented 10 months ago

https://github.com/VOICEVOX/voicevox_core/pull/675/commits/7005c96a74cb49789eaf07d7e2411163fca09bbc: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1394995277
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/94179921a7d34d324df61d3b047f3dfff410e66b: Merge branch 'main' into HEAD
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/1655719e178680e66b1ea6baa2fa3150f42949c5: doc内の「voicevox_core内の」という表現を「Rust APIクレート内の」と置き換えました。
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/a73f22c3a4dc76ea93d3af6b9003d71fc25659ae, https://github.com/VOICEVOX/voicevox_core/pull/675/commits/f17919b694ac29cbea4a8e3caaad900d77feeebd: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1394995786

qryxip commented 10 months ago

https://github.com/VOICEVOX/voicevox_core/pull/675/commits/48bdb1b74470eca0fd2ce033459dedbb6700d9af: https://github.com/VOICEVOX/voicevox_core/pull/675#discussion_r1395041306
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/9d7d001b81482fd31a29304766c5c4274c82b46d: 二重説明になっていた箇所を削除
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/af828eb943b1b43542648ac8c7280d3001591c89: 不要な修飾を削除
https://github.com/VOICEVOX/voicevox_core/pull/675/commits/b6b7975278b396ad6bbc225ab98b6185830c3613: マクロの中でのみ使っているArrayExtを、マクロの中に押し込める

Hiroshiba commented 10 months ago

修正確認しました！！マージします！！！

VOICEVOX / voicevox_core

ONNX Runtimeとモデルのシグネチャを隔離する #675

内容

関連 Issue

545

その他