atenreiro / opensquat

The openSquat is an open-source tool for detecting domain look-alikes by searching for newly registered domains that might be impersonating legit domains and brands.
https://opensquat.com
GNU General Public License v3.0
731 stars 135 forks source link

[ROADMAP] Social media squatting #26

Closed l3str4nge closed 4 years ago

l3str4nge commented 4 years ago

Hello, I have question about roadmap issue regarding social media squatting. I could work on this but I need some tips for that :) How can we detect social media squatting? The only thing that comes to my mind is somehow detect usernames just like domains.

atenreiro commented 4 years ago

Hi Mateusz,

That's exactly what I had in mind, searching for usernames that match the keywords from sources like:

1) LinkedIn (social_media/linkedin.py) 2) Instagram (social_media/instagram.py) 3) Facebook (social_media/facebook.py)

There are python libs to interact with these platforms. I have started working on Instagram, do you want to pick on any of the other two?

l3str4nge commented 4 years ago

Sure!

atenreiro commented 4 years ago

Shout if you need anything

atenreiro commented 4 years ago

Hi Mateusz,

Social networks are well prepared for anti-scraping, therefore extensive searches might be blocked and blacklisted but also not effective.

For this, I'm thinking about applying the "Levenshtein Automaton". Meaning, given a word/keyword "w" and a distance "n", it generates all word permutations within that distance and saves to a list.

The openSquat will then check for each of the keywords of the list if there is an associated account, such as:

https://facebook.com/keyword https://linkedin.com/in/keyword https://instagram.com/keyword

https://en.wikipedia.org/wiki/Levenshtein_automaton

l3str4nge commented 4 years ago

Yeah, for me it's good idea. We need to create mechannism for automaton then create abstraction for social media for easily adding new platform in the future for example twitter.

Already i am during investigation about facebook lib for python.

If you already started working on Instagram i could implement automaton on prepare abstraction for social media.

atenreiro commented 4 years ago

Hello!

I’ve not done much on IG, I was doing more exploration work and see how can I effectively check if a user exists or not and I think I should be able to achieve this by today.

I’ve done reading about the Leven Automaton but I actually haven’t started coding. Finding a Python-lib will probably be the fastest way to get this going.

But for long strings (e.g Facebook) getting all the permutations even on a distance of 1 edit might be too exhausting, therefore to keep the algorithm simple and without being blocked by social networks, I’m thinking of only do permutations with the vowels (a e i o u) and they usually are more susceptible to being exploited by fraudsters.

I’m still thinking how to solve it.

atenreiro commented 4 years ago

I want to make openSquat doesn't violate any service policy:

Instagram

"We prohibit crawling, scraping, caching or otherwise accessing any content on the Service via automated means, including but not limited to, user profiles and photos (except as may be the result of standard search engine protocols or technologies used by a search engine with Instagram's express consent)."

In this particular case, I don't think we are doing "automated means", meaning the user will always have to manually conduct the queries.

atenreiro commented 4 years ago

@mateuszz0000

I have not been coding as I've been sick (serious shoulder problem), however, I'm back! My Instagram code does not work anymore (website changes) so thankfully I have not pushed the changes to master.