WhichBrowser / Parser-PHP

Browser sniffing gone too far — A useragent parser library for PHP
http://whichbrowser.net
MIT License
1.8k stars 239 forks source link

Add `DMCA` crawling service to the bots list #570

Closed summercms closed 3 years ago

summercms commented 3 years ago

The DMCA.com badge works by signalling the existence of websites and pages to a server which then indexes each page of your site marked with a DMCA.com badge. Classically, most websites are separated by distinct addresses where pages and their content are returned directly by a server with little or no processing needed in between.

Single-Page-Application websites differ in that they leverage the end user's browser to do the work of routing the requested page and loading its content. This can be a problem for web bots that do not perform this rendering process themselves, as they will only see the application itself and not the requested content.

Currently, the only option for crawling a SPA site is using a third-party page rendering service. These services are typically used to return the rendered versions of a SPA site page for web crawlers like Google or Bing that also do not render SPA pages.

You can use these rendering services with the DMCA.com web crawler similarly to these search engines by enabling use with the following user-agent string:

DMCA.com Page Protection Crawling Service

Link: https://www.dmca.com/FAQ/Configuring-your-Single-Page-Application-SPA-website-for-scanning-and-indexing-with-the-DMCAcom-b

coveralls commented 3 years ago

Coverage Status

Coverage increased (+0.02%) to 99.97% when pulling 4a5e265b838d6732b97cc2b3ae3b26348c689b3c on ayumi-cloud:dmca into 880b9fa797401d14b28956442944c3daa70240ff on WhichBrowser:master.

NielsLeenheer commented 3 years ago

Closed by c027842f3aa67bb42c584f523bc8983003ed3058