commoncrawl / ia-web-commons

Web archiving utility library
Apache License 2.0
9 stars 6 forks source link

[WAT] Add rel attribute to A@/href links #10

Closed sebastian-nagel closed 7 years ago

sebastian-nagel commented 7 years ago

The rel attribute isn't extracted for A (and AREA) hyperlinks. The link types specified are useful, e.g., nofollow. Also check whether other attributes are worth to be extracted.

sebastian-nagel commented 7 years ago

Included in August crawl (CC-MAIN-2017-34).