anansi-project / comicinfo

ComicInfo.xml's new home
https://anansi-project.github.io/docs/category/comicinfo
MIT License
141 stars 8 forks source link

New Element: LocalizedSeries #6

Open ocgineer opened 2 years ago

ocgineer commented 2 years ago

Where does this comes from?

Myself and Discord

What is the rationale for adding support for this element?

LocalizedTitle To have a proper field available for the title if the comic/manga originates from another country than the language the book is translated published as.

LocalizedTitleScript could also be added in addition to LocalizedTitle for the use of actual native script (japanese or korean script).

gotson commented 2 years ago

Can you provide example of what this looks like?

lordwelch commented 2 years ago

What if instead a language attribute was added to the title tag and multiple title tags are allowed so long as they specify a separate language?

<Series lang="eng" sort="Batman">The Batman</Series>
<Series lang="spa" sort="Batman">El Batman</Series>
gotson commented 2 years ago

I think the language is not relevant, at least not for the published title, given we already have the LanguageISO field that should provide the language of the publication, so it would be redundant.

So for a Japanese publication translated in English, the English publication would have those if i understand correctly:

But that wouldn't tell us the language of origin of the publication.

ocgineer commented 2 years ago

It has as far as I know two uses, mostly for manga/manhwa/manhua and webtoons.

The manga is a scanlation

In the case of a scanlation the there is no 'official' English title thus the main title is in the language it was written in. Therefor the LocalizedTitle would be used for the 'common' fan title in English or the language it was translated to.

The manga is an official translation publication

In this case, there is an official English title and can be used as the main title. Many manga users still prefer to also use the Japanese (for manga) title to use as they are used to it and can search by this name as it is in the metadata.

Kavita has a localized title field that maps to alternateSeries but then in a discussion on Discord came up this specialized LocalizedTitle or LocalizedName element for the specific use case.

AlternateSeries can then be used for manga when there are is another 'English title' in case of scanlation, or it can be used for manga with long names that have a common shorter name among the fans that would also use these names in search. E.g.;

majora2007 commented 2 years ago

I would just say, based on another issue, it's been said that Comic users use AlternativeSeries as Story Arc type tags. I think having LocalizedSeries would be the best idea to ensure we can store the mappings, like DanMachi (which is how most people would look up a series) but also not create confusion between manga/comics on what a tag is used for.

gotson commented 2 years ago

I would like to clarify a few things:

@ocgineer can you provide detailed examples of real life cases for both scanlation and official translation, of what the different series titles are?

What i'd like to say is also this: please try to explain the problem at stake (what kind of data is missing or what you cannot express with the current data model), and not jump at the solution (what fields need to be added).

ajslater commented 2 years ago

Not to derail this too much but as a practical matter, i suspect most tagging is done from comicvine data (with comictagger i'd suspect). I can't recall having ever seen any of these tags used in the wild: AlternateSeries, AlternateNumber, AlternateCount, StoryArc, StoryArcNumber, SeriesGroup, AgeRating.

Unless a new popular, available source of metadata appears paired with new popular, convenient tagging software, whatever tags you might dream up for a spec, however logical and appropriate, will end up being a mostly academic exercise.

majora2007 commented 2 years ago

@ajslater A lot of manga users also use My Anime List for tagging through programs like MangaTagger. There are other efforts to bring manga based metadata into ComicInfo, hence why I think a single additional tag of LocalizedSeries would be nice.

I know for my application, Kavita, that LocalizedSeries would be used and I'm working on external metadata support as well.

ajslater commented 2 years ago

Does manga have a popular metadata file format that's stored with the archive analogous to comicinfo.xml?

majora2007 commented 2 years ago

It does not. The best we have is ComicInfo.xml and we have to use workarounds for how it works.

Like number of issues are used for number of volumes and not actual chapters.

ocgineer commented 2 years ago

What Majora2007 said, there is no specific manga metadata and thus we are trying to use what is available in ComicInfo.xml and sometimes bending the actual intention of what is defined for comics but then for the use of manga. Or trying to get some new fields in for the usage. As we work with manga, they are (always) translated and will have a title in the original language and a translated title and they are used both as much in the community. Chapter # and Volume # (collection of chapters) that manga uses is another thing.

majora2007 commented 2 years ago

I will say, and I think is starting to veer off into more of a philosophical discussion on the purpose of ComicInfo, that it would be really nice if ComicInfo can accommodate some stores of data for Manga itself. Currently, manga cannot fully use ComicInfo as it is designed with one metadata source in mind and a small set of producers. If the standard cannot be open enough to allow the potential to store some information even if there isn't a major player yet in the field, then this effort is pretty moot.

Kavita for example is looking to synchronize metadata from it's database into metadata in the file, to allow users the choice to move to another consuming application in the future and not be tied to Kavita. But without the means to write some essential information, another standard will have to be created and we will continue to be stuck in a future of fragmented software and lock-in.

We don't need to bloat the standard, but we should be able to put a few tags in where there is potential for automation, like LocalizedSeries, where the information is available from MAL, AL, MangaDex, all with APIs, but no program writes these fields because there is no standard there. But we don't need to be overly ambitious, looking at the Mangka tag, that is overly verbose for something that already exists (Writer) or Artist.

lordwelch commented 2 years ago

Just as a pedant ComicInfo wasn't designed with a data source in mind, it was created by ComicRack to store local data from the ComicRack internal metadata; the scrapers for different services came after as python plugins. This post on reddit has some of them and it has the manual for ComicRack.

This project is just a collection of people who have comics of any kind including manga who are trying to create new elements for a format that has been abandoned by it's creator. I think that @ajslater and @gotson are just trying to determine the proper usage for elements before they are added and are just being extra cautious about not including bloat.

Honestly the format that ComicRack created doesn't even have all the tags needed to account for ComicVine nor GCD and neither agree on how to represent comic series

gotson commented 2 years ago

@ocgineer for the third time, can you provide detailed examples of real life cases for both scanlation and official translation, of what the different series titles are?

ajslater commented 2 years ago

This Tag I can't say I have feelings one way or the other about but as comicvine doesn't supply it, it won't be filled in by automated processes for western comics. If it did, I'd be more inclined to add it I'd be pretty sure it would get used en-mass.

Meta I think if the purpose for comicinfo 2+ was more clear, then the issue of what elements to include or not would be easier.

Manga having no common embedded metadata format and being forced to abuse comicinfo seems less than ideal for manga readers. With that in mind I think I do think a mission of some future version of comicinfo could be to support Manga extensions where no comicinfo tag can do the job. But should that be v2.1?

Every bruce, dick and barbara is going to be requesting their own special tag on this project. A rubric or a statement of purpose for this format specification seems like a good idea. Not for the entire future of the project, but perhaps for the next version. Maybe supporting Manga isn't part of v2.1, idk. I think it would be nice if some future version did, but you might want some hard cutoffs and reasons to say no to ship this thing.

majora2007 commented 2 years ago

I can provide some examples: English Title: Brynhildr in the Darkness Localized Title: Gokukoku no Brynhildr

English Title: We Never Learn Localized Title: Bokutachi wa Benkyou ga Dekinai!

This is why he is requesting it (and I am 100% advocating for it). Imagine you are searching for We never Learn but your series is tagged as Bokutachi wa Benkyou ga Dekinai! because that is what the metadata service provides.

I agree with ajslater, there seems to be no defined goal of what this spec is for and the rules around it. If we can't add new things because it has to be supported by ComicVine and has to suit only Comic users, then it really defeats the purpose in my eyes of pushing the spec forward. Yes, we need to be in agreement on how much we let through, but at the moment, it would be nice to have some basic support for Manga without having hard requirements that a program write the tags out explicitly (or already exist).

gotson commented 2 years ago

To answer the Meta part:

ComicInfo is not specific to Manga, Comics, or any other type of publication. The problem with Manga is that people started using the format without any guidance or documentation, and decided to use some fields that are actually intended for something else.

v2.1 should be an intermediate step between v2, and the target model we are discussing in https://github.com/anansi-project/rfcs

Not everything should be added to 2.1, only things that makes sense, that's why there are lots of questions asked, to make sure we get it right. It's not an easy process. Then you may ask, "what does make sense to be added?". I don't have an absolute answer. That's why we need the discussions, the arguing, and the constructive disagreement.

But to state it again, ComicInfo is not just for Comics. ComicInfo is not just for ComicVine.

gotson commented 2 years ago

I can provide some examples: English Title: Brynhildr in the Darkness Localized Title: Gokukoku no Brynhildr

English Title: We Never Learn Localized Title: Bokutachi wa Benkyou ga Dekinai!

Thanks, but that doesn't really explain the script part that @ocgineer mentioned above.

The naming also seems weird to me, localized would mean translated to the language of the publication, but here it's the original title.

majora2007 commented 2 years ago

Let's leave the script for some other PR. It's getting meddled with the discussion of this actual tag itself. @ocgineer please create a separate issue if you want the script included and have concrete examples.

The example is just that, you can swap them around based on your preferences. If writing from external sources (which exist), then you'll be forced into that convention.

gotson commented 2 years ago

But that's semantically wrong.

If you look at that book published in english, it should have:

But if you look at the original publication, in Japanese, it should only have the original title in the Title field.

gotson commented 2 years ago

I think all those are related, in the discussions above what i get is:

We could imagine having multiple synonyms/alias to dump all those notions, however it would make it difficult for consuming applications to consume if there is no intent or hint as to what the titles are.

@majora2007 notwithstanding what's in the ComicInfo.xml, how does Kavita handles those things in the internal metadata model ?

majora2007 commented 2 years ago

We provide the user 2 titles to work with: Title and the LocalizedTitle. We don't care what is in which. The title is what renders to the screen, but the user can lookup with either.

Usually, in manga, users will look up with English or Japanese. Different sites show different titles, but it's usually either the English or romanji (unless you are using pure Japanese, which is an edge case).

For example, I use a mix on what is the Title, depending on what I know it as. Sometimes the Japanese is not easy to remember and I use the English or some shorthand as the title to best suite my needs of finding and remembering something, but having the localized title available makes it really nice when I search based on reading about a series so i don't have to google to translate it then check on that.

My opinion is to not take care of every edge case, but the most common. Then the user can decide what system they want to represent their files. The consuming applications just need to respect their tagging choices for display.

gotson commented 2 years ago

We provide the user 2 titles to work with: Title and the LocalizedTitle. We don't care what is in which. The title is what renders to the screen, but the user can lookup with either.

So IIUC the LocalizedTitle is for search only ?

FYI Komga has a series title, used for display and search, and a series sort title, used for sorting and search.


If we don't need any specific meaning/hint/intent on the additional series title, we could use something like aliases or synonyms.

We could imagine something like that:

<xs:element minOccurs="0" maxOccurs="unbounded" default="" name="SeriesAlias" type="xs:string" />

Which would be used like that, for example for this series:

<Series>Is It Wrong to Try to Pick Up Girls in a Dungeon?</Series>
<SeriesAlias>Dungeon ni Deai wo Motomeru no wa Machigatteiru Darou ka</SeriesAlias>
<SeriesAlias>DanMachi</SeriesAlias>
<SeriesAlias>ダンジョンに出会いを求めるのは間違っているだろうか</SeriesAlias>

We could also use hints with XML attributes, for example:

<Series>Is It Wrong to Try to Pick Up Girls in a Dungeon?</Series>
<SeriesAlias hint="romaji">Dungeon ni Deai wo Motomeru no wa Machigatteiru Darou ka</SeriesAlias>
<SeriesAlias hint="short">DanMachi</SeriesAlias>
<SeriesAlias hint="original">ダンジョンに出会いを求めるのは間違っているだろうか</SeriesAlias>

The hint would not be typed, and would be free text. Not sure if that would bring a lot of value for consuming applications though.

Maybe hint is not a very good name, we can discuss about it if we want to go that route.

gotson commented 2 years ago

Note that the proposed <xs:element minOccurs="0" maxOccurs="unbounded" default="" name="SeriesAlias" type="xs:string" /> would be incompatible with #10, as xs:all cannot be unbounded.

ocgineer commented 2 years ago

Just want to point out to keep in mind, that it should use romanized instead of romaji to keep it global if you want to go this route. I like the original as well, it can then contain the original tittle of any language, the work originates from.

The examples given were Japanese manga (as that is the most prominent use) thus romaji would be correct but there are also Korean and Chinese 'manga' and webtoons that are starting to get officially translated.

gotson commented 2 years ago

Just want to point out to keep in mind, that it should use romanized instead of romaji to keep it global if you want to go this route. :)

The examples given were Japanese manga (as that is the most prominent use) thus romaji would be correct but there are also Korean and Chinese 'manga' and webtoons that are starting to get officially translated.

As I said, the hint would be free text, there wouldn't be any convention.

majora2007 commented 2 years ago

I know we had a bunch of back and forth on here, I decided to implement LocalizedSeries as a tag within Kavita (no one currently writes this) so I can get some basic functionality for Manga. I think we have to be cognizant of the medium we have and that it cannot solve all possible scenarios. Giving some functionality is better than holding back because we can't cover all use cases.

gotson commented 2 years ago

I know we had a bunch of back and forth on here, I decided to implement LocalizedSeries as a tag within Kavita (no one currently writes this) so I can get some basic functionality for Manga. I think we have to be cognizant of the medium we have and that it cannot solve all possible scenarios. Giving some functionality is better than holding back because we can't cover all use cases.

It seems like you are tying up Kavita's metadata model to ComicInfo ?

majora2007 commented 2 years ago

Not exactly tying it to ComicInfo, but ComicInfo and Epub are the only ways to have metadata in a self-contained system. When possible, I'd like to import data from ComicInfo, while Kavita offers metadata above what ComicInfo can provide. So for LocalizedSeries, this is a field I already had available, but wanted to allow users and myself the ability to set it in the ComicInfo and have it work between Kavita installs, without me having to find the series and update it in both install.

ajslater commented 2 years ago

FWIW, I prefer lordwelch attribute's suggestions allowing multiple languages and sort schemes via attributes. While in the other thread, I know I voiced support for consistency of relying on more tags, it looks like both these cases should be tightly tied to the Series tag and lordwelch's example lets you specify as many language variations as you might like for a variety of consumers. With that schema the LanguageISO tag still represents the printed language found inside the comic.

<LanguageISO>eng</LanguageISO>
<Series lang="eng" sort="Batman">The Batman</Series>
<Series lang="spa" sort="Batman">El Batman</Series>
<Series lang="fra" sort="Batman">Le Batman</Series>
<Series>The Batman</Series>

If you wanted you could match the language tag to the series lang attribute and find the 'original language' series or use a tag without the lang attribute to represent the original. This schema has the benefit of being extensible to other tags like Title, AlternateSeries, Imprint and Publisher, Summary, Notes, and possibly other string tags.

The schema shown above is focused on language and localization solutions and does not take into account gotson's suggestion of a "short" hint.

<SeriesAlias hint="short">DanMachi</SeriesAlias>

which is interesting, but i'm guessing that such abbreviations aren't all that commonly desired for sorting or display?

ThePromidius commented 2 years ago

I have implemented the tag on my metadata editor and my own fork of Manga-Tagger.

Also Manga-Tagger currently maps the English name to AlternateSeries.

I plan to add a setting to let the user use English by default in which case English would go to Series and romaji to LocalizedSeries

Since you wanted more use cases. This is what anilist provides through their api. Romaji: Kage no Jitsuryokusha ni Naritakute! English: The Eminence in Shadow Native: 陰の実力者になりたくて!

As to the implementation of the tag I'd just leave it simple and assume that the LocalizedTitle language is the same as LanguageISO.

Personally I don't see myself adding how the serie is named in French if the content is in Spanish for example.

I think that this issue has been opened too long already for something that is really simple. I understand that there has to be agreement in what tags should be added and whatnot and to have that there has be some discussion. The problem is that few comments every few months will get us nowhere.

Edit: I just recalled that I've been typing LocalizedTitle because it's the name of the issue but i'm refering to LocalizedSeries I suggest renaming the name of the issue

gotson commented 1 year ago

Trying to pick this up again, as someone mentioned it today on a Komga issue.

There's a couple of requirements from what i can see:

  1. ability to specify alternative titles in different language/script. Script is important because some languages have multiple scripts, that would need differentiating (mostly for Japanese in Kanji/Hiragana/Katakana/Romaji)
  2. ability to specify alternative titles that are just aliases. It could be an accepted shortening (like SOA for Sword Art Online), or an alternative title altogether (example: Valérian, agent spatio-temporel was renamed to Valérian et Laureline)
  3. it also ties up to #4 , but given a series should only have a single sort, i don't see this working well together

As for the suggestions:

I would be in favor of using both label and lang.

awh-tokyo commented 1 year ago

I can't envision a use case for either "Japanese in Hiragana" or "Japanese in Katakana" except for sort ordering purposes.

majora2007 commented 1 year ago

Kavita has already implemented support for this since a while back. As mentioned in my post here, I do not believe it is ComicInfo's job to cater to all potential ways to represent Series data. Since this is mainly used for Manga, I believe offering an additional field LocalizedSeries is sufficient and their ingestion software of choice can implement alternative fields.

I'm not personally interested in adjusting the implementation in Kavita to support multiple languages as there is no added benefit to the user other than having it in ComicInfo.

ThePromidius commented 1 year ago

I keep my position in my previous comment and agree with @majora2007.

gotson commented 1 year ago

LocalizedSeries is too limited in my opinion. Seems it was added in Manga Manager and Kavita because no consensus was reached, but doesn't mean we should go for that option now because it is already implemented somewhere.

I do not believe it is ComicInfo's job to cater to all potential ways to represent Series data. Since this is mainly used for Manga, I believe offering an additional field LocalizedSeries is sufficient and their ingestion software of choice can implement alternative fields.

It's a bit contradictory. My reading of this is "we don't need something so complex, but we already implemented something simpler that works for us, so we should use that instead".

Since this is mainly used for Manga,

Mainly but not only. The model, even though it was initially done for Comics by ComicRack, should aim to be agnostic as much as possible.

I'm not personally interested in adjusting the implementation in Kavita to support multiple languages

The way consuming applications handle the metadata is up to them. You could always decide to ignore that new field, or use the first one found.

there is no added benefit to the user other than having it in ComicInfo.

to Kavita users. But there is for users of other applications, or if they ever want to migrate from Kavita to something else.

gotson commented 1 year ago

I can't envision a use case for either "Japanese in Hiragana" or "Japanese in Katakana" except for sort ordering purposes.

Search is a very good use case. You could search by using whatever script. Some people may like the Hiragana/Kanji display, but could not write it, so they could search using romaji.

Sort ordering would most likely be done using the romaji titles, else the japanese characters title would always end up at the end.

awh-tokyo commented 1 year ago

Sort ordering would most likely be done using the romaji titles, else the japanese characters title would always end up at the end.

If you're talking about foreign-language libraries, that's probably correct. If you're talking about Japanese monolingual libraries, this will result in an incorrect sort order -- The only feasible way to sort Japanese is to sort based on kana readings.

gotson commented 1 year ago

Sort ordering would most likely be done using the romaji titles, else the japanese characters title would always end up at the end.

If you're talking about foreign-language libraries, that's probably correct. If you're talking about Japanese monolingual libraries, this will result in an incorrect sort order -- The only feasible way to sort Japanese is to sort based on kana readings.

that's why we also have a proposal for #4

rosystain commented 1 year ago

In my opinion, a non-english manga book usually has its own official title in different language or areas which related to its publisher and isbn. I prefer use its localized title as main title. In this case, a Native Title filed should be more useful than a Localized One especially if you have multiple versions like Japanese, Chinese and others in one Library.

parasiteoflife commented 11 months ago

It's going to be almost 2 years since opened and 10 months since re-discussed and to this day there is no consensus? You are like Ents.

In a serious note, please consider adding this to the spec, this is extremely useful even essential for foreign users, I could know a series for their original name but that is because I am involved in the world of metadata and I'm a nerd but a normal user is not going to recognize a series by its original name because many times the translation differs completely from the original title, and that is taking into account that we are talking about English titles that it is a language that most people could know, if for example we talk about titles in Japanese then 90% of people (at least in my country and I would even dare to say continent) are not going to recognize it at all. And you can't even say this is an exotic request, this is supported in media centers like Kodi (https://kodi.wiki/view/NFO_files/Templates) and Jellyfin (https://jellyfin.org/docs/general/server/metadata/nfo/).

majora2007 commented 11 months ago

It's going to be almost 2 years since opened and 10 months since re-discussed and to this day there is no consensus? You are like Ents.

In a serious note, please consider adding this to the spec, this is extremely useful even essential for foreign users, I could know a series for their original name but that is because I am involved in the world of metadata and I'm a nerd but a normal user is not going to recognize a series by its original name because many times the translation differs completely from the original title, and that is taking into account that we are talking about English titles that it is a language that most people could know, if for example we talk about titles in Japanese then 90% of people (at least in my country and I would even dare to say continent) are not going to recognize it at all. And you can't even say this is an exotic request, this is supported in media centers like Kodi (https://kodi.wiki/view/NFO_files/Templates) and Jellyfin (https://jellyfin.org/docs/general/server/metadata/nfo/).

Not sure about the status of this officially, but Kavita and Manga manager has added support for this due to the immense need like you mentioned for Manga users.

I can't speak of the proposal's getting merged as it is on Goston to give the final approval.

gotson commented 11 months ago

No consensus could be reached, so there's no decision and no merge.