KurzonDax / nZEDbetter

An improved usenet indexer
nzedbetter.org
GNU General Public License v3.0
11 stars 4 forks source link

Why? #30

Closed ghost closed 11 years ago

ghost commented 11 years ago

Why a new git instead of a fork or even better contribute to nZEDb? We would appreciate the help!!

KurzonDax commented 11 years ago

Hey man, I apologize for taking so long to get back to you. Got my ass kicked at work this week, so haven't had time to mess with personal email and other such things.

So, I'm pretty much going to blame it on three things: embarrassment, laziness, and excessive drinking. I'll try to keep it short though.

I've been messing around with nZEDb pretty much since you guys first started it as an alternative to newznab. What you guys have done (mostly you with the tmux scripts) is pretty awesome. Unfortunately, I'm a perpetual tinkerer, and can't leave things alone. So I started mucking around with the scripts a little. However, my php, python, and javascript skills were pretty rusty. I haven't used any of the three in several years. To make matters worse, I had never hosted a project on Github, or even really used git for anything other than cloning.

After making a number of changes, I decided to mess around with creating a GitHub project, mostly to learn about using GutHub. After a few drinks one evening after work, I created a project, and pushed what I had done so far. To be honest, I didn't want to create a fork yet because I wasn't too confident in the code I was introducing and didn't want to look like an incompetent dumb-ass. I will also blame the drinks on the kind of shitty sounding readme.md.

Fast forward a bit and I had made a load of changes and a few bug fixes. So I figured what-the-hell and registered a cheap domain and created a wiki since I had actually put a decent amount of effort in to the thing. Since then, I keep intending to go back and move it to an actual fork of nZEDb (and this is where the laziness comes in), but quite simply, I just haven't gotten to it yet.

I'll try to get it done this weekend. Unfortunately, with the amount of changes made, it will probably be pretty difficult to do a compare with your code to easily see them. However, I'm more than happy to answer any questions, and you guys are obviously more than welcome to adapt anything I've added back in to your source.

Hope that helps clarify a little bit. By all means, drop me an email if I can do anything for you or your team.

KurzonDax

On Tue, Oct 1, 2013 at 1:23 PM, jonnyboy notifications@github.com wrote:

Why a new git instead of a fork or even better contribute to nZEDb?

— Reply to this email directly or view it on GitHubhttps://github.com/KurzonDax/nZEDbetter/issues/30 .

ghost commented 11 years ago

Well, I would definitely like to see what fixes you made, the musicbrainz integration and the search stuff. Those last 2 would be a big improvement.

KurzonDax commented 11 years ago

I guess to start with, take a look at:

lib/releases.php (particularly the stages) lib/binaries.php (mostly the updateGroup, scan, partRepair, and addMissingParts functions) lib/namecleaning.php (Still got some work to do there, specifically with music name cleaning)

That's where most of the script work has been done. My focus was trying to eliminate complex join queries and subqueries where possible that are pretty hard on the database, especially if you're not running a decent amount of RAM. While my methods aren't necessarily faster, they're less taxing on the system overall, at least based on my testing.

One thing I've done that the jury is still out on is introduced a "part hash" to prevent duplicate parts from being added. While it works fairly well, it becomes a major problem if the parts table grows too large for the index to stay in RAM. I still want to do more testing on it though.

I also wanted to put an emphasis on getting releases generated as quickly as possible, so I moved the "purging" functions out to a separate script which calls stage 7a on every run, and 7b at user defined intervals (new option in tmux settings). This allows update_releases to keep pumping out releases without getting hung up on purging. I also limit the max number of collections that can be purged in each batch. This keeps transactions smaller, resulting in less load on the DB.

I've got to go back and clean up the code a bit in releases.php and binaries.php, so I apologize for extraneous comments, etc.

The MusicBrainz integration isn't complete yet. I've done quite a bit of proof of concept testing that I haven't committed yet, and so far, it's fairly promising. It doesn't eliminate the need to do a bunch of cleaning on the release names, but so far, MB seems to do a much better job at matching up artists and titles than Amazon. The real downside is it requires building a local MusicBrainz replica database. Inquiries directly to musicbrainz.com are throttled to the point that it's unusable. Getting the replica going isn't hard, just a pain in the ass because it uses Postgres, which I hate, and MB's documentation on how to do it sucks. I've got the process documented now though, and will post to the wiki at some point.

The search functionality has been mostly done with the eBooks section, and wasn't hard. I added the auto-suggest stuff just using a jquery plugin. Beyond that, it was just adding more fields to search on. I plan to add better search to console, music, and movies soon.

I've got my dev server down at the moment, but will be bringing it back up this weekend. Once I do, I'll email you a link and you can take a look at the front end, and admin sections if you want.

iguyking commented 11 years ago

Where would I dig to see what you did around search? If I helped merge some of this with the existing branch would that encourage you to re-join the main development tree?

KurzonDax commented 11 years ago

With regard to 'search', what specifically are you looking for? The sphinx integration is, unfortunately, no where near complete as of yet. I'm also somewhat stalled on fixing the advanced search page in the Alpha theme (or more correctly my Cyborg theme, which is where I've focused efforts) because I derailed in to rewriting a bunch of the admin section. Yeah, I know, that should have been a lower priority.

The primary changes to search that I've committed so far are in the eBooks section with just adding some additional fields and adding the auto-suggest. My plan is to roll those changes in to improving movie, music, and console search as well, and then fixing the advanced search page.

As far as rejoining the main tree, I did re-fork the master branch over the weekend. My challenge has become moving my commits to that fork. I spent a while looking in to that, but I'm honestly not sure the best way to go about it.

sinfuljosh commented 11 years ago

Dax... i was looking into search to and like you ... things have been hell at work... i was curious.. have you looked into sphider? its a web crawler indexer with its own separated database and system which might help with performance and it looks a bit easier to maintain and configer with its own admin panels

KurzonDax commented 11 years ago

Haven't heard of that before. I'll take a look at it in the next few days. I'm still debating how much value Sphinx really brings to the package over all versus effort to put it back in. I don't think putting it in will be all that hard, except for building a decent way to present the results. I've done some basic prototyping and I think it would be useful. I'll look at sphider and see what it brings though.