Open Shiroizu opened 1 year ago
We do actually have one more "(Unknown)" voicebank style, and that's the "[engine] (Unknown)" type; for an example, look at "Pocket Singer (Unknown)". Even if we swapped to character entries, we'd have to keep these engine banks around, especially for make-it-yourself engines such as UTAU or Pocket Singer.
Anyways, it looks like the first option would carry the most information. If I'm reading this right, we'd basically want Unknown entries for any multi-bank artist (presumably still ignoring UTAU split banks due to complexity...?). Would we want Unknown banks for ones where all specific bank usages are verified...? And with Unknown banks, how much info should we give them? (I personally think base Unknown entries should receive the original release date and at least the original avatar, as otherwise the entries look rather unfinished.) Still, this sounds like the most accurate option, and thus I'm down for it. We'll just have a few kinks to iron out is all. ;3
@CatgirlFrostmoon - Good to know about these. I added them and heavily refactored/-structured the issue for readability.
If I'm reading this right, we'd basically want Unknown entries for any multi-bank artist (presumably still ignoring UTAU split banks due to complexity...?).
Yes. These will need at least the "Root unknown" entry (UE or UVB).
UTAU append entries were discourages in the past for reasons I can't recall. I don't think we should treat any engine any differently when it comes to guidelines.
Would we want Unknown banks for ones where all specific bank usages are verified...?
Potentially inaccurate credits are best avoided when unknown-entries are created before actually needing them:
And with Unknown banks, how much info should we give them?
Ideally data duplication should be kept at minimum.
The "root unknown" -entries serve as the new main character pages. It should include all the character-related information, and the derived voicebanks should inherit the relevant information where needed.
The following graph explains how Hatsune Miku's entry would be handled. Keep in mind that the amount of new entries is a lot higher compared to other characters.
Current:
New (separated by engines):
It might make the most sense to rename and add the "Unknown" to the current root voicebanks and then create the specific voicebank. In Miku's case, the current main page (Ar/1) would be renamed to "Hatsune Miku (Unknown)" and a new entry for "Hatsune Miku V2" would be created. The songs that definitely use the V2-voicebank can be detected and automatically fixed based on the publish date.
I remember thinking about this before, although it was more about managing all those variations instead of accuracy. This post might be useful: https://github.com/VocaDB/community/issues/54
UTAU append entries were discourages in the past for reasons I can't recall. I don't think we should treat any engine any differently when it comes to guidelines.
Explanation (back then, might be outdated reasoning by now):
UTAU voicebanks tend to have more appends/variations than Vocaloid banks, and the differences are often relatively minor. Creating artist entries for all those appends would be too complex to manage.
With the explosion of append/variations for Vocaloid voicebanks, this is no longer limited to just UTAU. A hierarchical management of Vocaloid voicebanks would be useful as well.
Personally, I think the best option would be having code level granular support for variations/appends. I don't have a concrete solution, because I never figured out a good one, but one possibility would be "sub-entries" that only appear under the parent entry. Hierarchy could be like you suggested. It would of course require development work, but I figured some way to help the management/grouping be a highly important feature considering the number of variations is increasing.
Updated with "Required & extra features for the new system" along with "Concrete migration plan".
For more context, see the current guidelines for choosing the correct vocalist credit: https://wiki.vocadb.net/wiki/79
With incomplete information, the default credit choice being the base voicebanks leads to multiple song entries where the vocalist credit is potentially inaccurate: A song marked with "Hatsune Miku" (V2) might actually be "Hatsune Miku V3 (Sweet)". It is difficult to say if a song is actually sung by Miku V2 or if it's just a case of incomplete credit information.
Issues when it comes to relying on limited "unknown"-entries (current approach): A) Visual clutter, especially with album credits B) Incorrectly included when counting voicebanks C) Unnecessary data duplication D) Skews the song graphs/stats E) Unclear voicebank hierarchy F) No programmatic way of mapping a credit text to an artist entry G) [main issue] Potentially inaccurate credits (credited for Ar/1 but actually Miku V3 (Sweet))
There are three possible approaches for addressing these issues and making the vocalist credit process more consistent:
Approach 1) Create unknown-entries for every possible case of ambiguity [best choice (* even better with new features)]
* Currently a tag, will be converted to an artist entry.
There are at least five types of "Unknown" voicebanks depending which information is unspecified:
Required features for the new system: (updated 2024/06/30)
Voicebank entry pages and the search should always include the full name (including the "unknown"-part) for clarity.
Extra QOL features:
Automatically include derived voicebanks on:
Concrete migration plan: (created 2024/06/30)
https://vocadb.net/T/4862/unspecified-voicebank-version
Approach 2) Only one unknown-entry per voicebank tree
Would lead to a large informational loss but gains in simplicity.
Approach 3) Replace all the vocalist types (Vocaloid/Utau/Cevio/etc.) with a "Character"-type. Voicebank tree graph length = 1 (no entries for Miku V3 Sweet etc.)