Additional Test Cases - Githubissues

I've included some of the test cases from ComicTagger here that this project doesn't handle the same way, these are a bit opinionated so not all of them necessarily need to be "fixed". I put in comments explaining how or why CT handles most of them, I also left out the scan_info/remainders as CT and this project have different cleanup strategies for them.
Also note that CT (with the complicated parser) parses all of these the same whether the # is there or not (except for the first one, that one it doesn't find the issue number)
FNS.update(
    {  # Issue number starting with a letter requested in https://github.com/comictagger/comictagger/issues/543
        "batman #B01 title.cbz": {
            "ext": "cbz",
            "issue": "B01",
            "series": "batman",
            "title": "title",
        },  # Leading issue number is usually an alternate sequence number
        "52 action comics #2024.cbz": {
            "ext": "cbz",
            "issue": "2024",
            "series": "action comics",
            "alternate": "52",
        },  # 4 digit issue number
        "action comics 1024.cbz": {
            "ext": "cbz",
            "issue": "1024",
            "series": "action comics",
        },  # Only the issue number. CT ensures that the series always has a value if possible
        "#52.cbz": {
            "ext": "cbz",
            "issue": "52",
            "series": "52",
        },  # CT treats double-underscore the same as double-dash
        "Monster_Island_v1_#2__repaired__c2c.cbz": {
            "ext": "cbz",
            "issue": "2",
            "series": "Monster Island",
            "volume": "1",
        },  # I'm not sure there's a right way to parse this. This might also be a madeup filename I don't remember
        "Super Strange Yarns (1957) #92 (1969).cbz": {
            "ext": "cbz",
            "issue": "92",
            "series": "Super Strange Yarns",
            "volume": "1957",
            "year": "1969",
        },  # Extra - in the series
        " X-Men-V1-#067.cbr": {
            "ext": "cbr",
            "issue": "067",
            "series": "X-Men",
            "volume": "1",
        },  # CT only separates this into a title if the '-' is attached to the previous word eg 'aquaman- Green Arrow'. @bpepple opened a ticket for this https://github.com/ajslater/comicfn2dict/issues/1 already
        "Aquaman - Green Arrow - Deep Target #01 (of 07) (2021).cbr": {
            "ext": "cbr",
            "issue": "01",
            "series": "Aquaman - Green Arrow - Deep Target",
            "year": "2021",
            "issue_count": "7",
        },
        "Batman_-_Superman_#020_(2021).cbr": {
            "ext": "cbr",
            "issue": "020",
            "series": "Batman - Superman",
            "year": "2021",
        },
        "Free Comic Book Day - Avengers.Hulk (2021).cbz": {
            "ext": "cbz",
            "series": "Free Comic Book Day - Avengers Hulk",
            "year": "2021",
        },  # CT assums the volume is also the issue number if it can't find an issue number
        "Avengers By Brian Michael Bendis volume 03 (2013).cbz": {
            "ext": "cbz",
            "issue": "3",
            "series": "Avengers By Brian Michael Bendis",
            "volume": "03",
            "year": "2013",
        },  # Publishers like to re-print some of their annuals using this format for the year
        "Batman '89 (2021) .cbr": {
            "ext": "cbr",
            "series": "Batman '89",
            "year": "2021",
        },  # CT has extra processing to re-attach the year in this case
        "Blade Runner Free Comic Book Day 2021 (2021).cbr": {
            "ext": "cbr",
            "series": "Blade Runner Free Comic Book Day 2021",
            "year": "2021",
        },  # CT treats book like 'v' but also adds it as the title (matches ComicVine for this particular series)
        "Bloodshot Book 03 (2020).cbr": {
            "ext": "cbr",
            "issue": "03",
            "series": "Bloodshot",
            "title": "Book 03",
            "volume": "03",
            "year": "2020",
        },  # CT checks for the following '(of 06)' after the '03' and marks it as the volume
        "Elephantmen 2259 #008 - Simple Truth 03 (of 06) (2021).cbr": {
            "ext": "cbr",
            "issue": "008",
            "series": "Elephantmen 2259",
            "title": "Simple Truth",
            "volume": "03",
            "year": "2021",
            "volume_count": "06",
        },  # CT catches the year
        "Marvel Previews #002 (January 2022).cbr": {
            "ext": "cbr",
            "issue": "002",
            "series": "Marvel Previews",
            "year": "2022",
        },  # c2c aka "cover to cover" is fairly common and CT moves it to scan_info/remainder
        "Marvel Two In One V1 #090  c2c.cbr": {
            "ext": "cbr",
            "issue": "090",
            "series": "Marvel Two In One",
            "publisher": "Marvel",
            "volume": "1",
        },  # This made the parser in CT much more complicated. It's understandable that this isn't parsed on the first few iterations of this project
        "Star Wars - War of the Bounty Hunters - IG-88 (2021).cbz": {
            "ext": "cbz",
            "series": "Star Wars - War of the Bounty Hunters - IG-88",
            "year": "2021",
        },  # The addition of the '#1' turns this into the same as 'Aquaman - Green Arrow - Deep Target' above
        "Star Wars - War of the Bounty Hunters - IG-88 #1 (2021).cbz": {
            "ext": "cbz",
            "issue": "1",
            "series": "Star Wars - War of the Bounty Hunters - IG-88",
            "year": "2021",
        },  # CT treats '[]' as equivalent to '()', catches DC as a publisher and 'Sep-Oct 1951' as dates and removes them. CT doesn't catch the digital though so that could be better but I blame whoever made this atrocious filename
        "Wonder Woman #49 DC Sep-Oct 1951 digital [downsized, lightened, 4 missing story pages restored] (Shadowcat-Empire).cbz": {
            "ext": "cbz",
            "issue": "49",
            "series": "Wonder Woman",
            "title": "digital",
            "publisher": "DC",
            "year": "1951",
        },  # CT notices that this is a full date, CT doesn't actually return the month or day though just removes it
        "X-Men, 2021-08-04 (#02).cbz": {
            "ext": "cbz",
            "issue": "02",
            "series": "X-Men",
            "year": "2021",
        },  # CT treats ':' the same as '-' but here the ':' is attached to 'Now' which CT sees as a title separation
        "Cory Doctorow's Futuristic Tales of the Here and Now: Anda's Game #001 (2007).cbz": {
            "ext": "cbz",
            "issue": "001",
            "series": "Cory Doctorow's Futuristic Tales of the Here and Now",
            "title": "Anda's Game",
            "year": "2007",
        },  # This is a contrived test case. I've never seen this I just wanted to handle it with my parser
        "Cory Doctorow's Futuristic Tales of the Here and Now #0.0.1 (2007).cbz": {
            "ext": "cbz",
            "issue": "0.1",
            "series": "Cory Doctorow's Futuristic Tales of the Here and Now",
            "year": "2007",
            "issue_count": "",
        },
    }
)
ajslater / comicfn2dict

Additional Test Cases #2