Update based on fb:pages

amneacsu commented 4 years ago

Hey, folks! Love your work!

I've been looking at <meta name="fb:pages" content="(\d+)" /> tags, and although I've confirmed most of the Facebook pages reported in sites.csv, some are apparently pointing to different pages, while some are new (not tracked in sites.csv). What I did was take the FB page ID from the scraped sites and open facebook.com/:id. Following is a list of FB page IDs, urls, and domains listed in sites.csv that don't match what's currently in the repo, or are not filled in.

440850842779072 - https://www.facebook.com/dupagepolicyjournal/
annearundeltoday.com - new
baltimorecitywire.com - new
northbaltimorejournal.com - new

299588323424419 - https://www.facebook.com/legalnewsline/
cookcountyrecord.com - stored here as facebook/cookcountyrecord
socalrecord.com - new

573209409408570 - https://www.facebook.com/CookCountyRecord/
setexasrecord.com - stored here as facebook/SETexasRecord
stlrecord.com - stored here as facebook/stlrecord

As an aside

What's interesting to me is the Legal Newsline connection. I initially started scraping for "GET THE APP" (before noticing there's already a column in the CSV for that), and was looking at "The Record, Inc." developer. Their apps use the same orange shower-looking thing that's in the Legal Newsline logo. ~I did a reverse image search for that and found another developer that has since changed logos: Right Mobile Pty Ltd, from this search. They have a bunch of republican/conservative apps, but haven't looked into it beyond that.~ I guess it makes sense if they're looking to change laws.

Not sure how much help this is, but I had fun.

Keep up the good work!

Edit: I may have misinterpreted the reverse image search results. I think it's catching the related apps section.

mentor20 commented 4 years ago

Thanks, love your approach!

The new ones listed are in sites.csv, unless I missed something:

annearundeltoday.com - #L745 baltimorecitywire.com - #L746 northbaltimorejournal.com - #L768

Keep hunting.

Noroc, frate de arme!

amneacsu commented 4 years ago

The new ones listed are in sites.csv, unless I missed something:

Their Facebook pages are "new", not the domains themselves. Sorry for the confusion.

mentor20 commented 4 years ago

Ah, I think I see now. Can you run it for the new ones from #L797? They were added today.

amneacsu commented 4 years ago

I did now, but none of those have a <meta name="fb:pages" /> tag. 🙁

mentor20 commented 4 years ago

Thanks. Can you roll a script for us to extract the FB URLs for the new domains, like https://desmoinesguide.com/ => https://www.facebook.com/DesMoinesGuide/?

amneacsu commented 4 years ago

Sorry, not with my skillset. The only part I managed to automate was the scraping. I think anyone with any Python, C# or Bash experience could come up with something much better than I ever could. 😞

mentor20 commented 4 years ago

All good, thanks for the inspiration... now you can add it to your skillset: https://github.com/MassMove/AttackVectors/blob/master/LocalJournals/utils/FacebookUrlExtractor/Program.cs#L74-L85

Just found that the Legal Newsline connection goes deeper: https://web.archive.org/web/20190806044945/https://legalnewsline.com/privacy https://web.archive.org/web/20190426153606/cookcountyrecord.com/privacy

MassMove / AttackVectors

Update based on fb:pages #53