afilipovich / gglsbl

Python client library for Google Safe Browsing API
Apache License 2.0
82 stars 37 forks source link

Adding custom threat lists to database #48

Open TheFrozenFire opened 5 years ago

TheFrozenFire commented 5 years ago

I am integrating gglsbl as a backend for checking of URLs that are sent in our customers' messaging. This library has been extremely helpful in doing so - props for that.

A question I have is whether anyone has attempted to integrate data from other sources into their database. For instance, we will be checking URLs against feeds such as OpenPhish, and industry-specific feeds we've gained access to.

It's not super obvious from the code whether the functions for converting a URL to an entry which can be added to the database are present, though it seems like it might be possible using the function to get the hashes of a URL, plus the functions to add threat entries to the database.

One area of confusion for me is how I might compute "threat prefixes". Is that just intended to be the first four hash values of a URL's hash list?

afilipovich commented 5 years ago

This library is compatible only with Google Safe Browsing API data source. URLs are transformed into hashes and hash prefixes as described in this spec: https://developers.google.com/safe-browsing/v4/urls-hashing

Conceptually it is possible to add support for other URL blacklist providers, but it would require a major overhaul. In fact it would be easier to make a separate library for other feeds as they list URLs in clear text while Google provides only irreversible hashes which require extra transformations and lookups.

TheFrozenFire commented 5 years ago

My intention would be to generate the same sort of hashes that Google does, and then add those to the database. I'm not looking to add support for other feed providers to the library itself, but rather to add some helper functionality for generating the same hash format as Google does, and for associating those hashes with threat lists.

afilipovich commented 5 years ago

Gotcha. You can use this class to translate URL to hash: https://github.com/afilipovich/gglsbl/blob/master/gglsbl/protocol.py#L168