caronc / newsreap

Usenet Framework that supports posting, indexing and retrieving
Other
38 stars 8 forks source link

How to get it to work on Windows? #1

Open Safihre opened 8 years ago

Safihre commented 8 years ago

It was a bit unclear to me from the code how far the functionality was, so I thought to just install it and see how it works! But I can't.. It's just impossible to get the curses module on Windows, nessecary for the yEnc decoder.

Safihre commented 8 years ago

Small other issues: https://github.com/caronc/newsreap/blob/master/setup.py#L27 README.rst is now called README.md

https://github.com/caronc/newsreap/blob/master/newsreap/lib/SocketBase.py#L63 PROTOCOL_SSLv2 doesn't exist anymore in Python 2.7.something on many platforms (OSX/Windows), so needed to remove that to get it to work.

Also PROTOCOL_SSLv23 actually doesn't mean SSL v2 or v3 at all, weirdly enough it means to Python: negotiate the highest encryption possible. So if you set that one, Python will make sure it always uses the best one, so probably either TLSv12 or TLSv1. We found that out recently in SABnzd. Weird, but that's Python :P

caronc commented 8 years ago

Unfortunately this code is under heavy development. The readme's say what it should do :). But I'm definitely not there yet. That said, all of the framework works. I can call xover, i can decode yEnc and uuencoded files. I JUST got all of the unit tests working tonight (finally... sighs).

So... in short; I can't tell you when this will work for Windows. It needs a lot of testing still! if you've got pip installed though, you might be able to just pass it the requirements.txt file as an argument and get most of the stuff you need. In windows, i think you need a small tool that allows you to compile stuff (gevent requires some compiling for example). Sorry :(

Thanks for the info on the PROTOCOL_SSLV2; will update that. Secondly, I will try to eliminate the yEnc dependency (TBH, i thought i already had; I have tests that test the manual python way vs the C libraries). I'll look further into that next too (if not tomorrow - getting late here).

Safihre commented 8 years ago

I did install using the requirements, that worked fine. Just the curses module is the problem. Especially since I already have the _yenc. Maybe you can add an if-statement that the curseswill only be imported if _yenc is not available. Then I can try the software ;)

caronc commented 8 years ago

Good call (sorry, I didn't notice that i didn't have it there already). You're certainly going to put this to the test because I've never even tried any of this on Windows. I may be using a lot of proprietary calls that are Linux only.

Again, please let me re-iterate how new this project is. It's maybe 50% complete :). I don't want to set any high expectations for you at this present time.

The framework is my focus right now which still might be useful to you. An example of how it might work would be:

from newsreap.NNTPSettings import NNTPSettings
from newsreap.NNTPManager import NNTPManager

# You can also just manually populate an NNTPSettings object from your
# own configuration file and pass it in. I'm just pulling a temporary file
# i have lingering around with this line. The Configuration file controls things
# such as # of threads, backup usenet servers, primary usenet servers,
# timeouts, etc.
cfg_file = join(dirname(dirname(dirname(abspath(__file__)))), 'config.yaml')

# Create our Settings Object and load our config file. 
settings = NNTPSettings(
    cfg_file=cfg_file,
    cfg_path=dirname(cfg_file),
    # Make sure the existing database is reset
    reset=True,
)

# Create Manager
mgr = NNTPManager(settings)
mgr.connect()

article = mgr.get(
    'gToHw.84894$OF.5nr6856@fx07.am1',
    work_dir='/tmp',
    group='alt.binaries.newsreap',
)

# Print content retrieved
 print article.body
 print article.header

print mgr.stat(
    'gToHw.84894$OFnr.56856@fx07.am1',
    group='alt.binaries.newsreap,
)

Notes of concern would be that importing from lib.* would be bad, i'll have to rework that so it's import newsreap.NNTPSettings so it can be an installable pip module (again, just stressing how this is merely checked in code from a test directory i started). So you can expect this to all change in the near future. -Done (commit).

I also have the lower level calls if you don't want to use the threading maintained by the NNTPManager (you can do it yourself or just run single threaded):

from newsreap.NNTPConnection import NNTPConnection
## NON SSL (toggle secure=True for SSL)
socket = NNTPConnection(
     host="nntp.server", port='465',
     username='mylogin', password='mypass'
     secure=False,
     # If your usenet server doesn't need it; don't use it (faster that way but not by much)
     # most providers don't require it.
     join_group=False,
)

# Make your connection
socket.connect()

# Fetch a list of groups (all of em)
socket.groups()

# Fetch a filtered list
sock.groups(filters='alt.binaries')

# Switch to a group of interest
sock.group('alt.binaries.newsreap')

# find out your current index location
sock.tell()

# got to a specific location (partial whence support - still in todo list to finish)
sock.seek(index)

# retrieve a batch of 500 article headers (index moves):
headers = sock.next(500)

#  Xover articles manually if you don't want to use the next() and prev() functions
sock.xover(start=300, end=500)

# Know the Article ID of something you want to download:
fileptr = sock.get("crazy.cool.articleidname", "temporary/directory")

# this will pull the article down as a temporary (file) object. if the object goes out of
# scope then the file it's associated with gets lost (deleted) too.

# The fileptr returned is actually a whole other object with a save() function allowing
# you to write the file to disk as the name you want it to be.  If you just call save()
# without parameters, it uses the name parsed from the yEnc tag or uuencode tag,
# or however it was detected.

The nr.py file I intend to complete (eventually) will allow for plugins (white lists, black lists, etc) but will allow you to control all of these functions identified above the command line as well as a full indexer as it completely ties itself in with any database you like (thanks to the beauty of SQLAlchemy). By default it just uses an SQLite database since just about anyone can run one of those (they are however brutally slow with lots of data unless you use a ram drive).

I'm hoping users of this framework would anyone to be able to rewrite an indexer and focus entirely on the website instead of the backend. Or maybe they just want to write a Usenet tool that checks a specific group for new posts from time to time... etc.

I'm also hoping to make this library/framework so general that anyone (who wanted to) could even build a web page around it; so i'd be overjoyed if it ever became the backend of SABnzbd! :)

Edit: Typo's, clarification, and a big push I did last night now handle the python module aspect (fixed setup.py up so it should work too).

Safihre commented 8 years ago

I tried, but got more errors on Windows :/ Maybe I should be a bit more patient until you feel it's more ready and maybe even had time to test it on Windows yourself :) (And when I can just give it an NZB and it will fetch all those articles and put them in files, not verification etc yet.)

One more note: I see you send the GROUP command before requesting article, but this is not nessecary anymore nowadays. I know it's probably in the Usenet spec, but none of the Usenet servers actually need it. SABnzbd will never send it, only when you force it in the Server Config.

caronc commented 8 years ago

The group() command is mainly used for indexing and article tracking (a huge part of an indexer). It's needed when parsing articles (with xover()) you'll parse headers and based on how clever your regular expressions are to key on certain things, you'll extrapolate enough information to build an NZBFile (most obfuscated stuff can't be done using this method though). For SABnzbd, you'd never use this portion or these functions ever. You'd already have all the Article-IDs ahead of time (in the NZBFile itself), so you'd be doing something more like:

# open nzbfile (NNTPManager will definitely support this eventually)
# the code below is greatly simplified (without error checking for
# readability and explanatory reasons):
files = {}
for filename in nzbfile:
    files[filename] = list()
    for segment in file:
        files[filename].append(sock.get("article-id", "SABznbd/directory"))

# merge/process here

The above would be handled for you eventually automatically in the NNTPManager with:

from newsreap.NNTPSettings import NNTPSettings
from newsreap.NNTPManager import NNTPManager
# Not written yet
from newsreap.NNTPnzb import NNTPnzb

# You can also just manually populate an NNTPSettings object from your
# own configuration file and pass it in. I'm just pulling a temporary file
# i have lingering around with this line. The Configuration file controls things
# such as # of threads, backup usenet servers, primary usenet servers,
# timeouts, etc.
cfg_file = join(dirname(dirname(dirname(abspath(__file__)))), 'config.yaml')

# Create our Settings Object and load our config file. 
settings = NNTPSettings(
    cfg_file=cfg_file,
    cfg_path=dirname(cfg_file),
    # Make sure the existing database is reset
    reset=True,
)

# Create Manager
mgr = NNTPManager(settings)
mgr.connect()

# Parse an NZB File (NNTPnzb not written yet)
# This is it's own class because when you post, you'll be able to call the write()
# function and produce an NZBFile for distribution.  It'll be used for created
# and extrapolating to/frome
nzb = NNTPnzb('path/to/nzbfile')

# results i'm still deciding as to what they'll be
# Probably a list of NNTPArticles()
results = mgr.get(nzb, '/path/to/download')

I like your idea though, maybe it's better if you hold off for a bit. Windows support isn't high on my priority list at the moment unfortunately. :)

Downloading nzb files a next in line for support (for sure). The other thing i'd like to support is the ability to monitor download transfers themselves and retry stale/stalled ones automatically. We all have those Usenet providers that stall out on us (network throughput dies) every now and then. This is a common gripe in the indexing world (was with newznab anyway).

Safihre commented 7 years ago

Hi, me again!

I have finished work on a new C-module that we will use in upcoming SABnzbd releases: SABYenc. Instead of having to do any pre-processing on the yEnc data, it will take the raw chunks of data that come from the socket (in a list) and decode them. Avoiding costly join, filtering, etc. This offers amazing increases in performance for SAB.

But I found that the biggest boost was that in this new C-module we release Python's GIL during decoding, basically making it multi-core! Doing this plus the other optimizations in SABYenc has doubled SAB's performance (we will release it to the public in a new 2.0.0 release, coming soon).

I also found that if you add the following 2 lines to the original _yenc module _yenc.c code, it already boosts performance by ~50-60%, especially when using SSL (because of the smaller chunks). image

Thought you might find it usefull :)

caronc commented 7 years ago

That's great man! Good for you for porting some of the slow code to C! That can be challenging dealing with all the different platforms out there but so rewarding in the end if you pull it off!

I appreciate you sharing your finds too! Thank you!