comictagger / gcd_talker

A Grand Comics Database talker for Comictagger
Apache License 2.0
5 stars 1 forks source link

[Feature request] TPB numbers unmatchable in gcd db #11

Closed Cducharme84 closed 5 months ago

Cducharme84 commented 8 months ago

The gcd db comes with “[nn]” as the issue number causing lots of potentially matchable archives to fail when CT gets the search results from the GCD talker.

I used a python script to change the number value in gcd_issue for all that matched “[nn]” resulting in CT being able to match with -1 in CLI so many more tpb and one shot entries. I think asking most users to manually do as I did would result in heavy support lift from users so I think a better way to approach would be to have the talker pass all [nn] results for issue as “1” instead.

mizaki commented 8 months ago

It appears that [nn] is correct by https://docs.comics.org/wiki/Issue_Numbers but CT has no file name parser that will convert #[nn] to the issue number (which is a string despite its name as you probably know). If you set [nn] as the issue number, it is found as expected.

How are you naming the files, #[nn]?

@lordwelch Do you think this should be a CT file name parser issue? Should the user expect #[nn] to be parsed as the issue number?

lordwelch commented 8 months ago

I think the easiest option is to just update the db when it's loaded right before the current index creation. Something like:

Update issue=REPLACE(issue,"[nn]","1") where issue="[nn]"
mizaki commented 8 months ago

I think the easiest option is to just update the db when it's loaded right before the current index creation. Something like:

Update issue=REPLACE(issue,"[nn]","1") where issue="[nn]"

The only problem with that is what if someone has entered [nn] as the issue number...

Maybe add an option: Treat "[nn]" as issue number 1 and then the fetch_issues_by_series_issue_num_and_year query can either be gcd_issue.number=? as is or (gcd_issue.number=1 OR gcd_issue.number='[nn]') if the issue number is 1 and the option is set.

lordwelch commented 8 months ago

I don't think that's a reasonable assumption. You can add an option and preserve the [nn] in the db but then the SQL can become more complicated

Cducharme84 commented 8 months ago

For my own use case I absolutely never have issue #[nn] intentionally, I hadn’t realized they came to that odd consensus there to be honest. When I’m tagging with GCD it’s a pre-comic vine tag to fill in genres etc in backlog titles then use the default CT mode with CV to get actual summaries and better crediting, so I had been manually changing GCD’s resulting issue for one shots and tpbs to 1 so they have a prayer at matching on step 2.

An option to treat as 1 or built in db updater to 1 I think has such a small chance of affecting anyone’s flow adversely as none of the management software (Mylar, comixed, etc) uses the [nn] schema in any way nor does Kavita’s current stable correctly parse letters in issues (this has changed with their refactor and hit their nightly recently) so I suspect either solution would be an enhancement of QOL with the gcd talker.

As for file names, this part of my own process usually involves non-altered downloads from GC and Usenet primarily and often don’t include an issue number in the cases where the talker* expects me to go with [nn].

*And rightly, it’s what is present in the data

mizaki commented 6 months ago

How is #15 for your issue?

Cducharme84 commented 6 months ago

Perfect, I like the idea of keeping the GCD local data intact and handling in the transform. And tying to assuming it’s 1 will work great for -1 in command line I think with OS and TPB entries. Thanks!

mizaki commented 6 months ago

Trickier question (maybe): Should the issue number be [nn] as per the GCD data or would the expected result be 1?

Cducharme84 commented 6 months ago

I know my expectation would be able to use the resulting tagged data to then move on and autotag with the next source (provided the series name is similar within the two DBs which is truly not a CT issue) so personally having the end result as "1" in the created/updated .xml is optimal.

GCD is the only (so far) supported talker whose database doesn't consider a TPB/OS/etc as an issue #1, so I do think it's reasonable to assume most users are going to want to take advantage of your work on the overlay side and have it not be a laboriously manual process due to GCD going their own way.

Cducharme84 commented 6 months ago

Funny enough I just realized because just because my intention is further tagging doesn't mean it'd be the intention of everyone, especially those who DO want their library to match the GCD schema. Perhaps a flag for CLI (maybe something like "--keep-gcd-numbering") and a check box on the metadata settings for GUI?

They do have other odd schema too that if we're possibly transforming their bracket madness, they often place brackets around issues that have no numbers but the series starts numbering later on. They'll go

But in regards to the issues of no number or [nn] it probably would be best to have the decision in user control.

Let's be honest, with having grab a SQLite db and have more manual setup than Metron or CV I really do expect the GCD users will be more savvy on decision making than the average person picking up CT for the firstime.

mizaki commented 6 months ago

I think then that an option to convert [nn] to 1 makes sense but all the others are too random (and probably of low quantity).

mizaki commented 6 months ago

I've added the option to replace [nn] with 1.

The only other possible option I see is removing [ and ] but I don't want to get into more data "cleaning" than is minimal because we all know the state of some of the GCD data :)

Cducharme84 commented 6 months ago

That's perfect, I agree that with the inconsistency for all the other bracket designations it really should be up to the library owner to decide to handle those since likely those issues also have weird numbering in other data sources and the whole automation flow most of us envision until we encounter actual comic databases is likely derailed more than just with GCD.

Thanks so much, now I just will need to kick their DB into WAL each download because I sometimes tag WAAAAYY too much simultaneously since it's local data, not sure if I should be proud that I've locked it enough with reads that I needed to do that but I kinda am!

mizaki commented 5 months ago

Addressed in #15