evilhero / mylar

An automated Comic Book downloader (cbr/cbz) for use with SABnzbd, NZBGet and torrents
GNU General Public License v3.0
976 stars 171 forks source link

Trying to deal with name differences between the pull-list and Comicvine naming #559

Closed IanHub closed 7 years ago

IanHub commented 11 years ago

I have a quite a few Zenescope titles on my pull list and they won't automatically show as wanted.

Looking at the pull-list values the often used "Grimm Fairy Tales presents" is returned as "GFT" (or possibly "GFTP") from the "preview" web-site.

Messing with the alternative names I could get it to find them on my newsnab but not to automatically work with the pull-list.

There is also an issue with Hellraiser titles (should be "Clive Barker's Hellraiser") and there may be other titles with similar issues, e.g. Buffy comics

It finds the short bit of text at the start of a name and changes it to the long text during the pull-list creating - this then matches the comic-vine name and Mylar does it's magic and tags the issues as wanted. These are just sample lists for problems my pull-list had.

I'm finding it useful - not sure if its of use to anyone else or if there is a better way of dealing with it?

EDIT: I've now changed this code so it uses a substitue.txt (csv) file in the mylar directory for the conversion rather than hardcoding the values (quite enjoying this playing with Python stuff, very interesting language)

File example for "substitute.txt"; format is "pull-list text|replacement text"

GFTP|GRIMM FAIRY TALES PRESENTS
GFT|GRIMM FAIRY TALES PRESENTS
HELLRAISER|CLIVE BARKER'S HELLRAISER
BTVS SEASON 9|BUFFY THE VAMPIRE SLAYER SEASON NINE

file reader code inserted near the top of "weeklypull.py"

    #Prepare the Substitute name switch for pulllist to comic vine converstion
    substitutes = "substitutes.txt"

    #shortrep is the name to be replaced, longrep the replacement
    shortrep=[]
    longrep=[]
    #open the file data
    with open(substitutes) as f:
        reader = csv.reader(f, delimiter='|')
        for row in reader:
            logger.debug ("Substitutes file read : "+str(row))
            shortrep.append(row[0])
            longrep.append(row[1])

    f.close()

this bit follows the existing comment of "# pullist has shortforms of a series' title sometimes and causes problems"

     #Step through the list - storing an index
    for repindex,repcheck in enumerate(shortrep):
        if len(comicnm)>= len(shortrep):
            #if the leftmost chars match the short text then replace them with the long text
            if comicnm[:len(repcheck)]==repcheck:
                logger.info("Switch worked on "+comicnm + " replacing " + str(repcheck) + " with " + str(longrep[repindex]))
                comicnm = re.sub(repcheck, longrep[repindex], comicnm)

Note: The "len" check possibly isn't needed as Python seems robust enough to deal with it, but Python is new to me so I left it in there in case.

IanH

evilhero commented 11 years ago

That looks good - I'll have to add the changes to my local repo and run through some quick tests on it, but from glancing it looks like it would flow pretty good. I just need to make sure that the name is changed prior to writing to the db (it would make things easier and probably less problematic when looking for some results - but you may have already put it in the proper place).

The 'len' you could even change to 'comicnm.startswith(repcheck) as it would just check the starting characters of the variable for comparison, but like you said, it would work regardless of the choice.

I like the idea of manually being able to add the abbreviations, cause it would be a pain to maintain them via the hard-coding. I think it can be done in the GUI and using sqlite, so there's no need to utilize a .csv file (which can sometimes cause problems depending on utf encoding, etc), but I most definitely can see how this works and the benefit in doing so.

Especially since I've been working on updating the weekly pull list to include upcoming issues for up to 3 months in advance (it's about 80% complete) :)

PS. Python is a really neat and fairly easy language to grasp - Mylar was first attempt at using python which started from being able to find comics on a pullist, and has grown into what it is currently :)

IanHub commented 11 years ago

Python does seem like you can do a lot in very few commands esp. when it comes to data handling...(was fun finding code to read the CSV into the separate arrays lists) the slightly more difficult bit is trying to follow the program logic and working out the best place to make the changes :) In this case I inserted the change code at the point you already had some code to change "O/T" in to "of the" as the comment looked like you had planned to deal with other possible changes at that point (gotta love comments when looking at someone else's code!)

It seemed easier to change the input from the pull list to match the already stored comic data than to try and mess with how Mylar searches the pull list.

In testing it worked okay; it was a simple one to test - I just used the "recreate pull-list" button then refresh the browser and you can see if they have changed - if they now match then the pull-list changes to "wanted" from "skipped+add+series" etc. This week It worked on "Hellraiser Dark Watch" and "GFT Robyn Hood Wanted" just fine and they got downloaded. Only awkward ones for me this week are Batman and Robin 24 (pulled as Batman and Two Face 24) and Wonder Woman, which is wrongly listed as 2004 on Comic Vine currently.

By using the csv (well pipe delimited in case there are ","'s in the actual names) I figured it would be easier to keep it up to date and for folks to share mods as they will change over time as series start and end. I was originally thinking that ultimately the best way was for the file to reside on the internet somewhere and Mylar load it like the pull-list as the same changes would affect everyone.... but then it will need someone to maintain it...and somewhere to store it.... so a GUI with a db would work well within the program, maybe with an import facility for a standard change list?

Great program btw, functionally it seems to work very well though a few bugs left to squash. For minor issues is it worth opening a ticket here, on the forum, or direct to you/twitter? - things like apostrophes causing search fails (e.g. barker's vs barkers - it looks like you are stripping punctuation out of the comic name and the nzbnab etc feed, but not apostrophes).

The extended forward pull-list sounds interesting.

IanH

evilhero commented 11 years ago

Sorry, I totally missed that last post for some reason..

I started off utilizing csv files very heaviliy, for the exceptions list that was to help distinguish series between the two different data sites Mylar was using originally, along with the pullist. However it was only used as a go-between, so I had some place to reference but store the data in a db - moreso for trying to make it less code throughout the process than anything else ;)

The Batman & Robin series has been a real pain the last few months since it started changing it's name from that to Batman & . The best way to get around it though is to use the Alternate Search Names option in the Edit Settings tab of the Comic Details screen. Utilizing that, Mylar will do searches for the given names, update the pullist accordingly (it won't change it from Batman & Two-Face to Batman & Robin, but it should mark it as Wanted), as well as the file-checker using the Search Names. The only problem I can see with the way you modify the pull-list is that it wouldn't use the new corrected naming for searches (sometimes they get posted like that too).

The import change list option is a viable alternative, as it wouldn't cause much overhead and it would follow the same logic as the importing of the custom_exceptions.csv file. As far as maintaining, I've been looking into a cloud-like usage for some files that involve websites so they don't get hammered frequently and because of that Mylar being blacklisted. The future pull-list that I've been working on is such an example, as right now with just myself using it it's fine, but if you throw in all the users of Mylar (which I have no real idea how many there are TBH), but the odds of hammering increase greatly. Hitting the cloud for the one file that it needs would save alot of time/resources.

The best place is on github here - start an issue, or append to a relevant open one if it's already being looked at. I get notified almost right away on my phone, so I tend to monitor it quite regularly. That being said I'm tied into all three (github, forums, and twitter), so generally posting I'll see it - but github seems to allow for better discussion of the problem ;)

IanHub commented 11 years ago

No Problem :)

The import change seemed to work well this week on the pulls.

You are right that it does not help with the actual searches but what it does do is get the issue automatically tagged as wanted then the standard search works mostly with the use of alternate search where required. For example; "Superbia" comes from the pull list as "SUPURBIA" but is listed in ComicVine, and therefore Mylar, as "Grace Randolph's Superbia" and on my comic Newznab it gets listed as "Superbia V2" ..... so using "SUPURBIA|GRACE RANDOLPH'S SUPURBIA" in my "rename script" and "V2 / Superbia" in the Mylar series/alternative name area makes it an automatic pull and download.

This is my current recode list which has covered the last two weeks okay

GFT GRIMM FAIRY TALES|GRIMM FAIRY TALES Needed as the GFT is redundant in this title GFT|GRIMM FAIRY TALES PRESENTS Works on all the rest of the Zenescope titles... so far.. HELLRAISER|CLIVE BARKER'S HELLRAISER CLIVE BARKER |CLIVE BARKER'S BTVS SEASON 9|BUFFY THE VAMPIRE SLAYER SEASON NINE SUPURBIA|GRACE RANDOLPH'S SUPURBIA

I have to say for the last week where I've tried to just leave Mylar alone to work it's magic it's generally been pretty damn good. Couple of crashes, mostly from DB locks by the look of it, and list refreshing isn't as automatic as I'd like as comics look like they are missing until you start manually refreshing things. I'll try and log any bugs as I find them on here then.

Cheers

IanH

evilhero commented 10 years ago

Wow so sorry I dropped the ball on this - I had this implemented at one point, but then the disastrous development-master merge that happened a few weeks ago killed most of the work I had done that hadn't committed yet - and this was one of those :(

I've added it into the commit that will be coming today shortly - just some very minor modifications, more so because I came across the errors myself. I renamed substitutes.txt to substitutes.csv since it seems to be abit more universal compliant, and when doing a weekly pull it will check to see if the substitutes.csv file exists within the mylar data_directory (if it's not specified, which it isn't unless via command line startup, it will be the root of mylar (same as where the exceptions.csv file is)).If it doesn't exist, it will skip the substitution code, otherwise it will run it accordingly.

I also gave props where props were due on this - both in the csv file as comments and in the weeklypull module itself.

Thanks for all the work in doing this :)

IanHub commented 10 years ago

No problem at all, I've just been re-inserting the two bits of code (load the contents of the file, perform the replacement) after an update, it would be nice to not have to mess around with this though :)

I set up a newznab which just catalogues a few comic newsgroups, and let Mylar work it's magic and it's been working great so I have not been checking back here as often as I was so I was not actually aware of the master merge problem :( sorry to hear about that.

Only issues I've had in the last couple of months is the odd DB lock problem (cured with a Mylar restart) and a tendency to pull very old versions of a couple of titles which may be cured by some of your recent changes. I'll see what it does on this weeks pull. update Still pulling old versions - so added as an issue with logs.

Thank you for all your hard work in creating Mylar! :)