blacklanternsecurity / bbot

A recursive internet scanner for hackers.
https://www.blacklanternsecurity.com/bbot/
GNU General Public License v3.0
4.02k stars 370 forks source link

Occasional Newlines in URLs #1457

Open TheTechromancer opened 2 weeks ago

TheTechromancer commented 2 weeks ago
[URL_UNVERIFIED]        https://uk.yahoo.com/news/ex-spandau-ballet-singer-complimented-080549530.html  excavate    (endpoint, extension-html, in-scope, spider-danger)
[URL_UNVERIFIED]        https://tw.news.yahoo.com/ç©æ¡é
                                                    °å±
風波-é­æ¤-éå½ç¼ç
§-æµåº-網åæ
[INFO] wispy_taylor: Modules running (incoming:processing:outgoing) cloud(506:1:0), dns(0:0:506), httpx(192:207:0), excavate(179:1:0)

Excavate's a-tag regex seems to be responsible:

  "discovery_context": "excavate's URL extractor (a-tag regex) found URL_UNVERIFIED: https://tw.news.yahoo.com/äºåé ä¸é£¯è-å³åè
                                                                                                                               å­è  çµ²-024022257.html in HTTP response body",

@liquidsec