evilhero / mylar

An automated Comic Book downloader (cbr/cbz) for use with SABnzbd, NZBGet and torrents
GNU General Public License v3.0
976 stars 171 forks source link

Date Search disparity and search accuracy #193

Closed digiwombat closed 11 years ago

digiwombat commented 11 years ago

So this is something of an imprecise bug. I guess you might even want to take it as an enhancement since it's sort of taken care of by the "Fuzzy Year" search option, but there are always those inaccuracy concerns re: New 52 and things like there where an early issue may overlap and, yeah. Reboots. Bad for everyone.

Anyway, moving on. As per the title, I am having a lot of trouble finding year-cusp issues (things from Dec. 2012 [actual release] that should be tagged as February 2013 [official release] and so on)

The problem seems to be that Mylar is using the "On-Sale Date" value instead of the Key Date or Pub Date which is what all taggers and most release groups are using these days.

Honestly, I think using both internally would be a pretty nice way to go about it since then you catch disparate tagging in a worst case, but as it is we are getting a LOT of false negatives based on the use of the On-Sale date.

Example Pages: http://www.comicvine.com/x-men/49-34221/ http://www.comics.org/series/50867/details/

Not sure how much this affects DC at large, but it certainly hit me with some annoying problems vis-a-vis Birds of Prey.

http://www.comicvine.com/birds-of-prey/49-42806/ http://www.comics.org/series/61066/details/

Again, not 100% sure how you want to classify this one since it's weird, but I think there are enough reboots lately to at least be a bit scared of using fuzzy to try to fill in blank spots when I think a little more due diligence in the base search logic could help a lot.

OKAY! Let me know, feel free to disagree. Just wanted to bring attention to it since I think it's a change worth making/testing.

evilhero commented 11 years ago

Well it's not about due diligence with the search logic as it is more about having to deal with the erroneous errors that the data presents.For every series' that has a Publication Date field on GCD, there's a series that doesn't have one at all (as in no column to reference). When I did the logic for the parsing it was near the beginning of Mylar, and I wasn't really thinking too outside the box...meaning if I got it to work I did a 'yay!' and didn't touch it again.

I spent most of the afternoon today (see my java pic on twitter @mylarcomics ) redoing the flow of the parser when it came to dates and which to use, etc. I finally got it so that it will take the publication date as it's primary source, and if it's not present it will take the lastissue date and increment the month by one. Now I know it's not perfect, but if I left it as 0000-00 it wouldn't help with searches at all (and that's the entire reason for having it in Mylar really). If I left it as the previous month (say issue#15 was 2013-02, and issue#16 had no pub date, it would have gone to 2013-02), which screws up the logistics for the main screen showing the latest issue. Imma work through that, cause I know how to fix it, but in the meantime it'll work better (hopefully) than the way it was.

digiwombat commented 11 years ago

Timbits work wonders for improving search logic, this is a known science fact.

Looking forward to getting down and dirty with the new search logic whenever the commit comes. It's always rough when you're combining like four different scraping methods and doing best guess stuff (I did some color estimation work a while back and striking that balance between guesstimation and refinement is a proper pain in the ass.) but sounds like it's going to be a lot smarter about decisions after this. Good times.

I need to brush up on Python then maybe I can help out with some of this stuff myself rather than just reporting bugs and whining. Haha.

evilhero commented 11 years ago

This has been pushed now to the latest build in developmental (not in master yet).To get the pub dates, you may have to do one of two things depending on the series:

Try doing a refresh for the series - it should replace the dates with the correct publication dates in the formatting of YYYY-MO now. Failing that, or if some are rolled over, and others not (it happened to me with Venom for some reason when I was testing), delete the series, then readd it and it will pull it down properly. Something about a stagnant sql reference that causes it to not re-add it properly when doing a refresh is what was happening....but a refresh should work firstly ;)

digiwombat commented 11 years ago

Works like a charm for JUST about every series. Ended up with dates of 0000-00 for Mara and Harvest where it was using the published date before.

That is the only place I came out with less accurate data across my entire collection, so not too shabby. Not sure if you want to look into how to handle this on this particular issue tracker or if you want it submitted as a new ticket? I'll leave this open until you decide how to handle it. Just let me know and if you want it rolled out to a new ticket, I will do so.

Otherwise, works fantastic. Even killed a few issues I already had to re-search and it's helped out a lot. Finding stuff a lot more reliably from Experimental now. Thanks! :D:D

evilhero commented 11 years ago

Just checked and both of those series have absolutely no dates anywhere to pull data from. Only suggestion I can make at this time is to signup to comics.org (its free) and submit a change request with the correct dates and then it will be picked up properly by Mylar :)

Glad its working for you though aside from that!

digiwombat commented 11 years ago

lol. Well never mind me then. Marking this closed and going to submit some dates.

Kids these days and their wacky incomplete internet databases.