Metron-Project / metroninfo

Digital Comc Book Metadata XML Schema
MIT License
13 stars 0 forks source link

Additional Series Information #38

Closed bpepple closed 2 weeks ago

bpepple commented 2 months ago

Some additional Series information might be useful to include like:

  1. Series Status: Using values like "Ongoing", "Cancelled", "Complete", "Hiatus".
  2. Count: Issue count for a series.

This big downside to this is that these values change since they are at a series-level and comics within the same series could have different values.

majora2007 commented 1 month ago

We need something like Count or Series Status. Count currently is in the ComicInfo spec and while it's not great (as it isn't clear how to use in more than loose issues), it offers the ability to tell if a Series is completed or Ended publishing.

The current spec does not offer any of that ability and that will be a major pain point in the software. Users generally tag once when they start a series and once when the series concludes. This is not something that needs to be tagged per file. To my knowledge, Komga and Kavita both take the information for series from the first issue.

The lack of the field means in order to keep the same functionality (which to me is critical), these servers will have to integrate with upstream APIs and have this new dependency. This is unlikely to happen in Kavita and Komga.

Series Status having the ability to have meta state would be beneficial. So instead of Ongoing/Complete, having Cancelled, Hiatus offers information that ComicInfo is missing. Pair it with some sort of Count and server softwares have a really rich way of informing users of the state of the Series.

Buried-In-Code commented 1 month ago

The problem that I have with including Status and Count is that then every time a new issue in a series is released I need to retag the whole series to update the Count element. The Status is even worse as that can change at any point in time, so I'd need to be constantly checking with a service (e.g. Metron) to see if the status has changed, if I'm constantly doing this check why do I need to include it in a static file.

majora2007 commented 1 month ago

@Buried-In-Code why would you need to retag the whole series? But even then, Count would only be set once a Count is known. Count is the number of issues released. When a series is releasing, that number is not known. It's only known once the series is completed.

Self hosting servers only check the first file, so you'd only need to tag that one file. I'm bringing the perspective of self hosted servers and users that use them. It doesn't sound like you use software like Komga/Kavita/Codex/Stump.

bpepple commented 1 month ago

My gut instinct is not to include this, since it seems to be very specific to only software like Kavita, Komga, etc, but if it helps to make support for the new schema more palatable to those potential users I guess I'd be open to adding the additional fields.

I do see @Buried-In-Code point though, since I think those folks that don't use software like Kavita will likely be re-tagging their comics for these changes (since comic collectors tend to be fairly obsessive).

Maybe we can check with the ComicTagger, Codex, and Komga devs to see if they have an opinion?

ajslater commented 1 month ago

I like how Metron has a seriesType and publisherType to encapsulate stuff like this more logically.

ComicInfo.xml has the poorly named Count field. ComicBookInfo.json has both numberOfVolumes and numberOfIssues fields.

In comicbox/Codex I have issueCount and volumeCount to support these fields and they get attached to Series and Volume database columns.

I notice that metron does not have imprintType or volumeType. I personally would place issueCount in a volumeType, and volumeCount in the seriesType because it's similar to what I've done in my own metadata code.

I don't think you do need to retag all books in a volume or series to get accurate data if the reader always uses the maximum value for a volume or series found, which is what I do in Codex. When Solarpunk Adventures #001 comes out I update the series count to whatever is in the metadata. When Solarpunk Adventures #002 is imported later, if it's tagged with series count, I update the series in the database with max(series.volume_count, new_imported_volume_count) and similarly max(volume.issue_count, new_imported_issue_count).

Philosophy and model design aside, it's a nice to have bit of data but not at all crucial, certainly not to readers. I like to see an issueCount attribute, but not strongly. I'd also encourage what comicBookInfo does with both volumeCount and issueCount. If both of those went away forever my users would barely notice.

Series Status seems primarily useful for Mylar like programs who would be querying an API anyway. So i'd vote against it.

majora2007 commented 1 month ago

Kavita also counts the max then tries to match it to the highest volume or issue (since it supports both). I would also welcome a volume Count being added in addition to an issue count.

Can we not have these as optional fields that, while not used by Metron, can be used by people that want to use this metadata for Manga or non-American Comics? Or does every field in the spec need to match with Metron.cloud's metadata?

bpepple commented 1 month ago

Well, I haven't seen any other replies, so it seems having an IssuesCount element would be useful for comic servers and I'm willing to add that if it helps with adoption (even though I'm not a big of adding non-static information).

It seems like it should be a sub-element of the Series element, but should the new element be IssuesCount or something TotalIssues? Or something else?

majora2007 commented 1 month ago

Sub-element of Series I agree on. I'm indifferent to the name, they both work well to me.

ajslater commented 1 month ago

So when tagging from comicvine the tagger should query the issue counts for all volumes under the series and sum them to place in this series issue count field?

bpepple commented 1 month ago

So when tagging from comicvine the tagger should query the issue counts for all volumes under the series and sum them to place in this series issue count field?

How is it done currently with Comic-Tagger? Or does it not provide an issue count? I haven't run CT in ages, so I've zero clue how they do it with the CV API.

ajslater commented 1 month ago

In comictagger the issue_count is volume based. Probably because that's how ComicVine does it. So with some example comicvine data:

Series:
  volume_count: 2
  Volume 1:
    issue_count: 12
  Volume 2:
     issue_count: 10
     Issue #004

In comictagger, the issue_count for Volume 2, Issue #004 would be 10. Comictagger also does a volume_count for ComicLover's ComicBookInfo format, so that would be 2. But I've been told that format is being deprecated.

It sounds like with the schema you're proposing the MetronInfo Series.issueCount would properly be 22 in this example.

bpepple commented 1 month ago

Ok, just to make sure I understand you using some live data:

>>> from comicsdb.models import *
>>> series = Series.objects.filter(name__iexact="black lightning")
>>> series.count()
3
>>> for item in series:
...     print(f"{item.name} v{item.volume}: {item.issue_count} issues")
... 
Black Lightning v1: 11 issues
Black Lightning v2: 13 issues
Black Lightning v3: 1 issues

The way I see the IssueCount would be if you were to tag Black Lightning v1 #1 the Series element would look something like this:

<Series id="1897" lang="en">
    <Name>Black Lightning</Name>
    <SortName>Black Lightning</SortName>
    <Volume>1</Volume>
    <Format>Single Issue</Format>
   <StartYear>1977</StartYear>
   <IssueCount>11</IssueCount>
</Series>

Here we are given the total number of issues for this particular series, i.e. Black Lightning v1. It's possible I'm misunderstanding what you guys would need.

ajslater commented 1 month ago

To me it seems like if the element is part of the Series element it would logically relate to Series. So I might prefer something more like:

<Volume issueCount="11">1</Volume>

If this representation of Series is not meant to be a standalone representation of a Series but only occur as a subtag in an Issue's metadata, then the Series.IssueCount tag would be unambiguous because you won't ever have multiple Volume tags.

However, if that schema is meant to represent a standalone series then might either need a Volume Schema as well to represent multiple volumes within a series or use the issueCount Volume attribute suggested above.

Also, since you represented it above, <StartYear> is meant to represent the start of the Series? and Volume Start year is unrepresented, yeah? Sometimes volumes are 1 based and sometimes they're years which encodes that information.

bpepple commented 1 month ago

If this representation of Series is not meant to be a standalone representation of a Series but only occur as a subtag in an Issue's metadata, then the Series.IssueCount tag would be unambiguous because you won't ever have multiple Volume tags.

However, if that schema is meant to represent a standalone series then might either need a Volume Schema as well to represent multiple volumes within a series or use the issueCount Volume attribute suggested above.

Ok, I think I understand what you're saying.

Yes, the Series element represents the series for the individual issue the xml file is providing information, not as a representation of the the Series object (including it's various volumes).

Also, since you represented it above, <StartYear> is meant to represent the start of the Series? and Volume Start year is unrepresented, yeah? Sometimes volumes are 1 based and sometimes they're years which encodes that information.

Isn't the volume number as the Start Year a hack that was used as a workaround for Comic Vine? To the best of my knowledge I'm not aware of any series that uses a year in their indicia (not to say it's not possible, since we are talking about the comic industry).

ajslater commented 1 month ago

Yes, the Series element represents the series for the individual issue the xml file is providing information Got it. Good. Thanks.

Isn't the volume number as the Start Year a hack that was used as a workaround for Comic Vine?

I don't know. Like you said, I wouldn't expect consistency. I just looked at a title i'm familiar with: Wolverine's first limited series by Frank Miller in 1982 is tagged as Volume 1 on ComicVine. Wolverine's first ongoing series in 1988 used to be tagged as Volume 2, but is now also Volume 1 on ComicVine. The Wolverine volume that started in 2020 is Volume 6 on Comicvine.

IIRC, Marvel has inconsistently referred to serial volume numbers during the run of this and all of it's comics. But today they seem to have retroactively replaced volume numbers with year of first issue. Wolverine 1982, Wolverine 1988, Wolverine 2020, Wolverine 2024 and so on. Despite referring to "Vol" often int he past The Marvel website's current term for each volume is "Series".

I tend to see Series and Volumes as both having a bit of extra metadata associated with them. In XML this could be attributes or subtags depending on the importance of the metadata and personal preference. Publishers much less so and Imprints are rarely tagged at all. I prefer making potentially rich schemas for all those layers, but there's not really a practical case for Publishers or Imprints to be more than simple text elements.

Anyway, this is a digression. Volumes are years sometimes and not other times. I've never seen a Volume as a non-numeric string ever, but it wouldn't surprise me if one appeared.

The layout you have above looks to me like it implies are referring to so if they referred to Volume I'd prefer they'd either live as Volume attributes or subtags or be called . But if there was consistent convention that they always referred to Volume that could be documented and we could all work with it.

bpepple commented 1 month ago

The layout you have above looks to me like it implies are referring to so if they referred to Volume I'd prefer they'd either live as Volume attributes or subtags or be called . But if there was consistent convention that they always referred to Volume that could be documented and we could all work with it.

I don't have a strong opinion either way. What would you like for the Series element and sub-elements to look like?

ajslater commented 1 month ago

What springs to mind immediately is:

<Series id="1897" lang="en">
    <Name>Black Lightning</Name>
    <SortName>Black Lightning</SortName>
    <Volume>
        <Name>1</Name>
        <IssueCount>11</IssueCount>
        <StartYear>1977</StartYear>
    </Volume>
    <Format>Single Issue</Format>
   <StartYear>1977</StartYear>
</Series>

or

<Series id="1897" lang="en">
    <Name>Black Lightning</Name>
    <SortName>Black Lightning</SortName>
    <Volume issueCount="11" startYear="1977">1</Volume>
    <Format>Single Issue</Format>
</Series>

Tags vs attributes is a judgement call. But I'd be inclined to use <Volume><Name>1</Name></Volume> For consistency with Series.

bpepple commented 1 month ago

Truthfully, I like using the attributes more than your first example, but I'm fine with either.

One thing about your first example is that I think it would be more clear to the user if we used Number instead of Name, since I'm not aware of any volume containing alphanumeric characters (though it could be addressed by the documentation).

@majora2007: Do you have an opinion of @ajslater's series suggestions?

ajslater commented 1 month ago

Truthfully, I like using the attributes more than your first example, but I'm fine with either.

Sounds good.

One thing about your first example is that I think it would be more clear to the user if we used Number instead of Name, since I'm not aware of any volume containing alphanumeric characters (though it could be addressed by the documentation).

Yeah, I hear that. I use name because I have some l code that treats series, volume, imprint, and publisher abstractly. But in my comicbox tool I also coerce Volume name into an int. So Number is fine. I keep waiting for that int coercion decision to bite me, but it hasn't yet. It's probably fine.

bpepple commented 1 month ago

Alright. I'll write up a PR for this change (Option 1, unless someone expresses an interest in Option 2 before this afternoon when I create it).

majora2007 commented 1 month ago

There has been a lot of discussion. I'm about to head on a small holiday, so next week I'll give full comments.

bpepple commented 1 month ago

There has been a lot of discussion. I'm about to head on a small holiday, so next week I'll give full comments.

Ok, I'll hold off on creating a PR until you get a chance to comment. Have a nice weekend!

bpepple commented 1 month ago

@majora2007 Do you still want to comment on this?

majora2007 commented 1 month ago

Yes, so let me try to recap and understand the proposition.

The original ask was to have some way to give the total number of issues or volumes within a Series so rich metadata servers can derive publication status (aka series has concluded publishing).

The initial proposal was to capture total issue count (and we can leave the same pain point of volume count to be left to the server to derive) or we could have a totalIssues and totalVolumes, both optional fields.

AJ is proposing the following: <Volume issueCount="11" startYear="1977">1</Volume>

This is telling me that this series is 1 total volume or 11 issues within and that the start year of the volume is 1977? Which is mainly for comics.

Am I understanding the proposal correctly?

bpepple commented 1 month ago

Yes, I believe you are and you're right this is changing what we planned to use those elements for (which I had forgotten about the difference in usage between US comics & manga), so in retrospect it might be better to leave the elements as they are and add the IssueCount sub-element.

@ajslater Thought?

ajslater commented 4 weeks ago

This is telling me that this series is 1 total volume or 11 issues within and that the start year of the volume is 1977? Which is mainly for comics.

This is telling you that the volume named "1" contains 11 issues and the first issue starts in the year 1977. The start year information is somewhat redundant in that it could also be computed from the publication date of the first issue in the volume.

The number of total volumes in a series is not contained within each volume because it concerns Series. So

<Series volumeCount="5"><Name>Trilobite Wars</Name>

Would tell you that the series named "Trilobyte Wars" contains 5 volumes. This scheme mirrors both how ComicVine serves data and how the ComicBookInfo metadata format (ComicLover) stores data.

If manga benefits from storing issue count at the series level, than that might be a good option to allow in the Series tag as well.

<IssueCount> as a sub element would accomplish the same thing as an attribute. It's entirely a stylistic preference. An <IssueCount> tag's context is given by it's parent tag. e.g.

<Series>
  <Name>Trilobyte Wars</Name>
  <IssueCount>22</IssueCount>
  <!-- IssueCount in Series is redundant if Volumes are well tagged with IssueCount.
       If Manga does not make much use of volumes then this would be the primary issue count tag.
       This Implies the existence of other volumes which contain the remainder of issues 
   -->
  <VolumeCont>2</VolumeCount>
  <!-- VolumeCount explicitly tells us how many volumes the series contains. -->
  <Volume>
    <Number>1</Number>
    <!--  
    Anglo superhero comics are organized around Volumes and ComiVine serves the issue count
    as part of Volume data.
    -->
    <IssueCount>11</IssueCount>
  </Volume>
</Series>
majora2007 commented 4 weeks ago

I think keeping it as a sub element and having both volume count and issue count as optional would be the ideal way to handle it. As an attribute isn't consistent since we already have a series element with sub elements.

bpepple commented 2 weeks ago

So, I'm not sure where we are on this. @majora2007, Are you in agreement on @ajslater's second example in this comment? I don't have a strong opinion either way, as long as we give decent documentation.

majora2007 commented 2 weeks ago

Yes, keeping as a sub element, so:

<Series>
  <Name>Trilobyte Wars</Name>
  <IssueCount>22</IssueCount>
  <VolumeCont>2</VolumeCount>