Wappalyzer identifies technologies on websites, such as CMS, web frameworks, ecommerce platforms, JavaScript libraries, analytics tools and more.
If you don't have time to configure, host, debug and maintain your own infrastructure to analyse websites at scale, we offer a SaaS solution that has all the same capabilities and a lot more. Our apps and APIs not only reveal the technology stack a website uses but also company and contact details, social media profiles, keywords and metadata.
git clone https://github.com/wappalyzer/wappalyzer.git
cd wappalyzer
yarn install
yarn run link
node src/drivers/npm/cli.js https://example.com
about:extensions
src/drivers/webextension
about:debugging#/runtime/this-firefox
src/drivers/webextension/manifest.json
A long list of regular expressions is used to identify technologies on web pages. Wappalyzer inspects HTML code, as well as JavaScript variables, response headers and more.
Patterns (regular expressions) are kept in src/technologies/
. The following is an example of an application fingerprint.
"Example": {
"description": "A short description of the technology.",
"cats": [
"1"
],
"cookies": {
"cookie_name": "Example"
},
"dom": {
"#example-id": {
"exists": "",
"attributes": {
"class": "example-class"
},
"properties": {
"example-property": ""
},
"text": "Example text content"
}
},
"dns": {
"MX": [
"example\\.com"
]
},
"js": {
"Example.method": ""
},
"excludes": "Example",
"headers": {
"X-Powered-By": "Example"
},
"text": "\bexample\b",
"css": "\\.example-class",
"robots": "Disallow: /unique-path/",
"implies": "PHP\\;confidence:50",
"requires": "WordPress",
"requiresCategory": "Ecommerce",
"meta": {
"generator": "(?:Example|Another Example)"
},
"probe": {
"/path": ""
},
"scriptSrc": "example-([0-9.]+)\\.js\\;confidence:50\\;version:\\1",
"scripts": "function webpackJsonpCallback\\(data\\) {",
"url": "example\\.com",
"xhr": "example\\.com",
"oss": true,
"saas": true,
"pricing": ["mid", "freemium", "recurring"],
"website": "https://example.com",
}
Find the JSON schema at schema.json
.
Field | Type | Description | Example |
---|---|---|---|
cats |
Array | One or more category IDs. | [1, 6] |
website |
String | URL of the application's website. |
"https://example.com"
|
Field | Type | Description | Example |
---|---|---|---|
description |
String | A short description of the technology in British English (max. 250 characters). Write in a neutral, factual tone; not like an ad. | "A short description." |
icon |
String | Application icon filename. | "WordPress.svg" |
cpe |
String | CPE is a structured naming scheme for technologies. To check if a CPE is valid and exists (using v2.3), use the search). | "cpe:2.3:a:apache:http_server :*:*:*:*:*:*:*:*" |
saas |
Boolean | The technology is offered as a Software-as-a-Service (SaaS), i.e. hosted or cloud-based. | true |
oss |
Boolean | The technology has an open-source license. | true |
pricing |
Array |
Cost indicator (based on a typical plan or average monthly price) and available pricing models. For paid products only.
One of:
|
["low", "freemium"] |
Field | Type | Description | Example |
---|---|---|---|
implies |
String | Array | The presence of one application can imply the presence of another, e.g. WordPress means PHP is also in use. | "PHP" |
requires |
String | Array | Similar to implies but detection only runs if the required technology has been identified. Useful for themes for a specific CMS. | "WordPress" |
requiresCategory |
String | Array | Similar to requires; detection only runs if a technology in the required category has been identified. | "Ecommerce" |
excludes |
String | Array | Opposite of implies. The presence of one application can exclude the presence of another. | "Apache" |
Field | Type | Description | Example |
---|---|---|---|
cookies |
Object | Cookies. | { "cookie_name": "Cookie value" } |
cookieNames |
Array | Cookie names. | [ "^_ga_[A-Z0-9]+\\;version:GA4" ] |
dom |
String | Array | Object | Uses a query selector to inspect element properties, attributes and text content. |
{ "#example-id": { "property": { "example-prop": "" } }
}
|
dns |
Object | DNS records: supports MX, TXT, SOA and NS (NPM driver only). |
{ "MX": "example\\.com" }
|
js |
Object | JavaScript properties (case sensitive). Avoid short property names to prevent matching minified code. | { "jQuery.fn.jquery": "" } |
headers |
Object | HTTP response headers. | { "X-Powered-By": "^WordPress$" } |
text |
String | Array | Matches plain text. Should only be used in very specific cases where other methods can't be used. | \bexample\b |
css |
String | Array | CSS rules. Unavailable when a website enforces a same-origin policy. For performance reasons, only a portion of the available CSS rules are used to find matches. | "\\.example-class" |
probe |
Object | Request a URL to test for its existence or match text content (NPM driver only). | { "/path": "Example text" } |
robots |
String | Array | Robots.txt contents. | "Disallow: /unique-path/" |
url |
String | Array | Full URL of the page. | "^https?//.+\\.wordpress\\.com" |
xhr |
String | Array | Hostnames of XHR requests. | "cdn\\.netlify\\.com" |
meta |
Object | HTML meta tags, e.g. generator. | { "generator": "^WordPress$" } |
scriptSrc |
String | Array | URLs of JavaScript files included on the page. | "jquery\\.js" |
scripts |
String | Array |
JavaScript source code. Inspects inline and external scripts. For performance reasons, avoid
scripts where possible and use
js instead.
|
"function webpackJsonpCallback\\(data\\) {" |
html (deprecated) |
String | Array |
HTML source code. Patterns must include an HTML opening tag to
avoid matching plain text. For performance reasons, avoid
html where possible and use
dom instead.
|
"<a [^>]*href=\"index.html" |
Patterns are essentially JavaScript regular expressions written as strings, but with some additions.
\\.
). Double quotes must be escaped only once (\"
). Slashes do not need to be escaped (/
).()
) are used for version detection. In other cases, use non-capturing groups ((?:)
).^
and $
) where possible for optimal performance.Tags (a non-standard syntax) can be appended to patterns (and implies and excludes, separated by \\;
) to store additional information.
Tag | Description | Example |
---|---|---|
confidence |
Indicates a less reliable pattern that may cause false positives. The aim is to achieve a combined confidence of 100%. Defaults to 100% if not specified. |
"js": { "Mage": "\\;confidence:50" }
|
version |
Gets the version number from a pattern match using a special syntax. |
"scriptSrc": "jquery-([0-9.]+)\.js\\;version:\\1"
|
Application version information can be obtained from a pattern using a capture group. A condition can be evaluated using the ternary operator (?:
).
Example | Description |
---|---|
\\1 |
Returns the first match. |
\\1?a: |
Returns a if the first match contains a value, nothing otherwise. |
\\1?a:b |
Returns a if the first match contains a value, b otherwise. |
\\1?:b |
Returns nothing if the first match contains a value, b otherwise. |
foo\\1 |
Returns foo with the first match appended. |
$ npm i -g wappalyzer
wappalyzer <url> [options]
-b, --batch-size=... Process links in batches
-d, --debug Output debug messages
-t, --delay=ms Wait for ms milliseconds between requests
-h, --help This text
-H, --header Extra header to send with requests
--html-max-cols=... Limit the number of HTML characters per line processed
--html-max-rows=... Limit the number of HTML lines processed
-D, --max-depth=... Don't analyse pages more than num levels deep
-m, --max-urls=... Exit when num URLs have been analysed
-w, --max-wait=... Wait no more than ms milliseconds for page resources to load
-p, --probe=[basic|full] Perform a deeper scan by performing additional requests and inspecting DNS records
-P, --pretty Pretty-print JSON output
--proxy=... Proxy URL, e.g. 'http://user:pass@proxy:8080'
-r, --recursive Follow links on pages (crawler)
-a, --user-agent=... Set the user agent string
-n, --no-scripts Disabled JavaScript on web pages
-N, --no-redirect Disable cross-domain redirects
-e, --extended Output additional information
--local-storage=... JSON object to use as local storage
--session-storage=... JSON object to use as session storage
--defer=ms Defer scan for ms milliseconds after page load
$ npm i wappalyzer
const Wappalyzer = require('wappalyzer')
const url = 'https://www.wappalyzer.com'
const options = {
debug: false,
delay: 500,
headers: {},
maxDepth: 3,
maxUrls: 10,
maxWait: 5000,
recursive: true,
probe: true,
proxy: false,
userAgent: 'Wappalyzer',
htmlMaxCols: 2000,
htmlMaxRows: 2000,
noScripts: false,
noRedirect: false,
};
const wappalyzer = new Wappalyzer(options)
;(async function() {
try {
await wappalyzer.init()
// Optionally set additional request headers
const headers = {}
// Optionally set local and/or session storage
const storage = {
local: {}
session: {}
}
const site = await wappalyzer.open(url, headers, storage)
// Optionally capture and output errors
site.on('error', console.error)
const results = await site.analyze()
console.log(JSON.stringify(results, null, 2))
} catch (error) {
console.error(error)
}
await wappalyzer.destroy()
})()
Multiple URLs can be processed in parallel:
const Wappalyzer = require('wappalyzer');
const urls = ['https://www.wappalyzer.com', 'https://www.example.com']
const wappalyzer = new Wappalyzer()
;(async function() {
try {
await wappalyzer.init()
const results = await Promise.all(
urls.map(async (url) => {
const site = await wappalyzer.open(url)
const results = await site.analyze()
return { url, results }
})
)
console.log(JSON.stringify(results, null, 2))
} catch (error) {
console.error(error)
}
await wappalyzer.destroy()
})()
Listen to events with site.on(eventName, callback)
. Use the page
parameter to access the Puppeteer page instance (reference).
Event | Parameters | Description |
---|---|---|
log |
message , source |
Debug messages |
error |
message , source |
Error messages |
request |
page , request |
Emitted at the start of a request |
response |
page , request |
Emitted upon receiving a server response |
goto |
page , url , html , cookies , scriptsSrc , scripts , meta , js , language links |
Emitted after a page has been analysed |
analyze |
urls , technologies , meta |
Emitted when the site has been analysed |