0xd34db33f / gfyp

Unification of dnstwist + SQLite + Email reporting. Set it as a cron job that runs every hour, give it a list of domains and email addresses for reporting, then watch it go find stuff.
66 stars 13 forks source link

bug: the same (alert email address, domain) pair can be added multiple times #9

Closed kristovatlas closed 7 years ago

kristovatlas commented 7 years ago

This causes dnstwist to examine the same domain n times for each invocation of core.py

2 easy ways to fix this:

  1. modify sql schema so that CREATE TABLE statement includes UNIQUE (email address, domain) constraint. should probably modify existing tables in the wild during build or add.
  2. modify python so that it SELECTs (email, domain) pair before attempting to INSERT. should probably remove duplicates from existing tables in the wild during build or add.
0xd34db33f commented 7 years ago

Finally getting some time to pick this back up and run with it. I'm fine with option 2, however I think we need to make the only domains in the lookupTable to be unique. So something like...

  1. Create a get_entry_domain function that gets all the entries for a domain
  2. Add a sanity check to the add_domain function in util.py that throws an error if there are any entries found for that domain.
0xd34db33f commented 7 years ago

Re-thinking the approach in my last comment there. We're kinda in this rough place of either kludging together a hack that ensures uniqueness in Python or using the database in it's natural purpose to ensure uniqueness.

After further reflection, I don't think the Python get_entry_domain function referenced above is the correct way to go. But I don't want to be on the hook for creating some awful code that ensures the database is updated to the proper spec every time we make a change to the DB. What if we add a magic value to the database to specify the database schema version that we are running with and then GFYP bails if it either doesn't find a copy of that in the database or if it' an older copy. We could then add a function to util.py called "convert" that inits a new database, and copies all the entries out of an old database into the new one. Seems cleaner to me rather than adding a bunch of checks to GFYP proper code.