ajslater / comicfn2dict

Parse common comic filenames and return a dict of metadata attributes
GNU General Public License v3.0
1 stars 2 forks source link

Incorrect Series Value When Remainder Info Before Issue Number #8

Closed bpepple closed 7 months ago

bpepple commented 7 months ago

Had a user report a bug with a comic named Ex Machina 050 (2 Covers) (2010) (Digital) (Zone-Empire).cbz. When attempting to parse with comicfn2dict results in the following:

from comicfn2dict import comicfn2dict
fn = "Ex Machina 050 (2 Covers) (2010) (Digital) (Zone-Empire).cbz"
comicfn2dict(fn)
{'ext': 'cbz', 'year': '2010', 'original_format': 'Digital', 'scan_info': 'Zone-Empire', 'series': 'Ex Machina 050 (2 Covers'}

Cloned your repo, and wrote a quick test and it looks like the problem is that the filename has some remainder info before the issue number :

---Init-------------------------------------------------------------------------
  Ex Machina 050 (2 Covers) (2010) (Digital) (Zone-Empire).cbz
  {}
---After Clean Path-------------------------------------------------------------
  Ex Machina 050 (2 Covers) (2010) (Digital) (Zone-Empire)
  {'ext': ('cbz', 57)}
---After Issue------------------------------------------------------------------
  Ex Machina 050 (2 Covers) (2010) (Digital) (Zone-Empire)
  {'ext': ('cbz', 57)}
---After Volume-----------------------------------------------------------------
  Ex Machina 050 (2 Covers) (2010) (Digital) (Zone-Empire)
  {'ext': ('cbz', 57)}
---After Date-------------------------------------------------------------------
  Ex Machina 050 (2 Covers)/(Digital) (Zone-Empire)
  {'ext': ('cbz', 57), 'year': ('2010', 27)}
---After original_format & scan_info--------------------------------------------
  Ex Machina 050 (2 Covers)
  {'ext': ('cbz', 57),
 'original_format': ('Digital', 34),
 'scan_info': ('Zone-Empire', 44),
 'year': ('2010', 27)}
---After original_format & scan_info--------------------------------------------
  Ex Machina 050 (2 Covers)
  {'ext': ('cbz', 57),
 'original_format': ('Digital', 34),
 'scan_info': ('Zone-Empire', 44),
 'year': ('2010', 27)}
---After Issue on ends of tokens------------------------------------------------
  Ex Machina 050 (2 Covers)
  {'ext': ('cbz', 57),
 'original_format': ('Digital', 34),
 'scan_info': ('Zone-Empire', 44),
 'year': ('2010', 27)}
---After publisher--------------------------------------------------------------
  Ex Machina 050 (2 Covers)
  {'ext': ('cbz', 57),
 'original_format': ('Digital', 34),
 'scan_info': ('Zone-Empire', 44),
 'year': ('2010', 27)}
---After Series & Title---------------------------------------------------------

  {'ext': ('cbz', 57),
 'original_format': ('Digital', 34),
 'scan_info': ('Zone-Empire', 44),
 'series': ('Ex Machina 050 (2 Covers', 0),
 'year': ('2010', 27)}
---After issue can be volume----------------------------------------------------

  {'ext': ('cbz', 57),
 'original_format': ('Digital', 34),
 'scan_info': ('Zone-Empire', 44),
 'series': ('Ex Machina 050 (2 Covers', 0),
 'year': ('2010', 27)}
Ex Machina 050 (2 Covers) (2010) (Digital) (Zone-Empire).cbz
{'ext': 'cbz',
 'issue': '050',
 'original_format': 'Digital',
 'remainders': ('(2 Covers)',),
 'scan_info': 'Zone-Empire',
 'series': 'Ex Machina',
 'year': '2010'}
{'ext': 'cbz',
 'original_format': 'Digital',
 'scan_info': 'Zone-Empire',
 'series': 'Ex Machina 050 (2 Covers',
 'year': '2010'}
{'dictionary_item_removed': [root['issue'], root['remainders']],
 'values_changed': {"root['series']": {'new_value': 'Ex Machina 050 (2 Covers',
                                       'old_value': 'Ex Machina'}}}

If I get some free time this week, I'll try to write a pull request to fix this unless you get to it first.

ajslater commented 7 months ago

v0.2.2 fixes this issue. Thanks!

(Also took the time to parse titles that happen to be the exact same as a legal printing Format as formats instead of titles, which parses the Wonder Woman test better.)