Lcforself / safebrowsing-python

Automatically exported from code.google.com/p/safebrowsing-python
MIT License
0 stars 0 forks source link

Simple patch to support key/value store (memcached) #10

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
As I needed to lookup a lot of URLs in safebrowsing, I wanted to have a
fast in-memory lookup. So here is a quick-and-dirty patch to add key/value
store (memcached) in safebrowsing-python. 

Two important notes :

- I didn't change the prepare_db.py. As the current update of the key/store
database is done using a script like this :

import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
malware = open("goog-malware.txt")

i=0
for line in malware:
        if i>0: 
                key = line[1:-1]
                key.rstrip().rstrip() 
                if key[:-1] is not None:
                        mc.set(key[:-1], "M")
        i=i+1

But if you want to incorporate it in prepare_db in a clean way, feel free.

- The behaviour of the lookup is a bit different as when the URL is not
matching the database, it will return a False. (makes more sense to me but
this is a matter of taste ;-)

The advantage of the key/value storage : it's the speed, a small example
with 3 URLs (2 not matching and 1 matching) :

sqlite3                memcached
real    0m6.432s       real    0m0.054s
user    0m5.610s       user    0m0.040s
sys     0m0.720s       sys     0m0.020s

The second is when a running a lot of processes in parallel to do the
lookup you can continue to update the key/value store without real impact
on the lookup cost.

If you have other ideas or comments on the patch, don't hesitate.

Thanks a lot.

Original issue reported on code.google.com by adulau on 16 Aug 2009 at 6:21

Attachments:

GoogleCodeExporter commented 8 years ago
As of r47, support for the memcached backend has been added. It is possible to 
now
add support for non-RDBMS backends. I am going to upload the doc for it soon.

So this ticket is almost closed but not until the docs are in!

Original comment by thejaswi...@gmail.com on 28 Dec 2009 at 12:15

GoogleCodeExporter commented 8 years ago
Great,I have an updated version (for the redis backend -
http://code.google.com/p/redis/) of the my latest patch but I'll wait for your 
final
release to see how to integrate the changes and provide a diff if required.

Thanks for your work.

Original comment by adulau on 28 Dec 2009 at 12:23

GoogleCodeExporter commented 8 years ago
I have pushed in the latest changes with the API finalized. Check out
http://code.google.com/p/safebrowsing-python/wiki/AvailableBackends for 
documentation
on how to add a new backend.

Waiting for the redis backend! Also do let me know if your name is in the
CONTRIBUTORS.txt

Original comment by thejaswi...@gmail.com on 29 Dec 2009 at 5:32