Scanner Combining Openings / Endings with Alphabetical Characters (ED2a, OP2a, etc.) for forced AniDB ID (anidb-xxxx)

jaller200 commented 2 years ago

Platform

Operating system and version: Unraid 6.9.2 / Docker 20.10.5 Plex version: Version 4.69.1

Expected Behavior

When appending [anidb-xxxx] to the end of a series' folder name, openings and endings should be picked up properly as Episode 101, 102, ... and 151, 152, ...

Current Behavior

I'm currently using the file naming scheme {Series Name} - {AniDB Episode Number (e.g. 01, 02, ED1...)} - {Episode Title} [Additional Metadata].

For this report, I have three opening files for the anime Nagi no Asukara (AniDB ID: 9387):

Nagi no Asukara [anidb-9387]/Nagi no Asukara - OP1 - Lull - Soshite Bokura wa (1-13) [Bluray 1080p]
Nagi no Asukara [anidb-9387]/Nagi no Asukara - OP2a - Ebb and Flow (14-24) [Bluray 1080p]
Nagi no Asukara [anidb-9387]/Nagi no Asukara - OP2b - Ebb and Flow (14-24) [Bluray 1080p]

If I scan this folder using the forced ID, it detected the first two files as a single Episode 102:

Logs Nagi no Asukara [anidb-9387].filelist.log

Nagi no Asukara [anidb-9387].scanner.log

Images

However, if I take away the forced ID and use the year instead, or even just the name, it renders correctly:

Logs Nagi no Asukara (2013).filelist.log

Nagi no Asukara (2013).scanner.log

Images

Trying as C1, C2, etc. just lists it as unmapped (Episode 501+).

Steps to Reproduce

Create a folder (in this case, Nagi no Asukara [anidb-9387])
Put the first opening and first second opening in the subfolder (e.g. Nagi no Asukara - OP1 - Lull - Soshite Bokura wa (1-13) [Bluray 1080p].mkv). It doesn't matter if anything after OP1 or ED2a exists, or if it's the only two files
Scan / re-scan the library.

Additional information

What's most interesting is that this issue seems to affect the anime if it has an alphabetical character across numbers. Another anime (Konohana Kitan, AniDB ID: 13030) has Openings 1a and 1b and they both seem to appear fine, even when using a forced AniDB ID.

I'm unsure what might be the root cause here, but I suspect it has to do with either the alphabetical characters themselves, or some weird issue with the numbers.

ZeroQI commented 2 years ago

Op1 should be s00e101 and not s00e102 and bug only when using forced ID? Weird, must be something bad in forced ID part somewhere

jaller200 commented 2 years ago

I've finally got some time to sit down and examine the codebase in a bit more detail, and from my brief scan over the script, I believe the issue might have something to do with these lines of code:

### OP/ED with letter version Example: op2a
if not ep.isdigit() and len(ep)>1 and ep[:-1].isdigit():  ep, offset = int(ep[:-1]), ord(ep[-1:])-ord('a')
else:                                                     offset = 0
if anidb_xml is None:
  if ANIDB_RX.index(rx) in AniDB_op:  AniDB_op[ANIDB_RX.index(rx)]   [ep] = offset # {101: 0 for op1a / 152: for ed2b} and the distance between a and the version we have hereep, offset                         = str( int( ep[:-1] ) ), offset + sum( AniDB_op.values() )                             # "if xxx isdigit() else 1" implied since OP1a for example... # get the offset (100, 150, 200, 300, 400) + the sum of all the mini offset caused by letter version (1b, 2b, 3c = 4 mini offset)
  else:                               AniDB_op[ANIDB_RX.index(rx)] = {ep:   offset}
cumulative_offset = sum( [ AniDB_op[ANIDB_RX.index(rx)][x] for x in Dict(AniDB_op, ANIDB_RX.index(rx), default={0:0}) if x<ep and ANIDB_RX.index(rx) in AniDB_op and x in AniDB_op[ANIDB_RX.index(rx)] ] )
ep = ANIDB_OFFSET[ANIDB_RX.index(rx)] + int(ep) + offset + cumulative_offset    # Sum of all prior offsets

Adding my own logs and uncommenting the one already there it prints this:

Doing a bit more digging — haven't worked on a Plex scanner much before, so I find it takes a few moments to regenerate the logs, but it's made me curious...

jaller200 commented 2 years ago

Found it!

### OP/ED with letter version Example: op2a
if not ep.isdigit() and len(ep)>1 and ep[:-1].isdigit():  ep, offset = int(ep[:-1]), ord(ep[-1:])-ord('a')
else:                                                     offset = 0
if anidb_xml is None:
  if ANIDB_RX.index(rx) in AniDB_op:  AniDB_op[ANIDB_RX.index(rx)]   [ep] = offset # {101: 0 for op1a / 152: for ed2b} and the distance between a and the version we have hereep, offset                         = str( int( ep[:-1] ) ), offset + sum( AniDB_op.values() )                             # "if xxx isdigit() else 1" implied since OP1a for example... # get the offset (100, 150, 200, 300, 400) + the sum of all the mini offset caused by letter version (1b, 2b, 3c = 4 mini offset)
  else:                               AniDB_op[ANIDB_RX.index(rx)] = {ep:   offset}
cumulative_offset = sum( [ AniDB_op[ANIDB_RX.index(rx)][x] for x in Dict(AniDB_op, ANIDB_RX.index(rx), default={0:0}) if x<ep and ANIDB_RX.index(rx) in AniDB_op and x in AniDB_op[ANIDB_RX.index(rx)] ] )
ep = ANIDB_OFFSET[ANIDB_RX.index(rx)] + int(ep) + offset + cumulative_offset    # Sum of all prior offsets

In these lines, ep comes in as type str. Inside the sum(..) function the script performs a comparison x<ep. If there is no offset at all (i.e. AniDB_op = {}), this never executes.

However, if there is an offset, then things get thrown off. Because ep is a string and x is an integer, then it will perform a string/int comparison.

In my case, the following variables were set:

ep = '1'
AniDB_op = {1: {2: 1}, 2: {2: 1}}
ANIDB_RX.index(rx) = 2

Thus the script evaluated 2 < '1', which in Python 2 returns true (this throws an error in Python 3 from what I tested):

What I believe should change are these lines from this:

if not ep.isdigit() and len(ep)>1 and ep[:-1].isdigit():  ep, offset = int(ep[:-1]), ord(ep[-1:])-ord('a')
else:                                                     offset = 0

to this

if not ep.isdigit() and len(ep)>1 and ep[:-1].isdigit():  ep, offset = int(ep[:-1]), ord(ep[-1:])-ord('a')
else:                                                     ep, offset = int(ep), 0

Which properly converts ep to an integer value regardless, and thus causes this to work properly. I shall open a merge request as well.

ZeroQI / Absolute-Series-Scanner