flaxsearch / flaxcode

Automatically exported from code.google.com/p/flaxcode
4 stars 1 forks source link

HTML parser should understand directives to skip sections #151

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
There are various directives in use in the web to tell crawlers to ignore
sections of documents.  The HTML parser should understand these directives
(it should probably also be possible to configure it to ignore them).

A good list of such directives is at:
http://wunderwood.org/most_casual_observer/2007/05/selective_page_indexing_direc
t.html

Original issue reported on code.google.com by boulton.rj@gmail.com on 30 Nov 2007 at 1:15

GoogleCodeExporter commented 9 years ago

Original comment by boulton.rj@gmail.com on 9 Jan 2008 at 11:23

GoogleCodeExporter commented 9 years ago

Original comment by charliej...@gmail.com on 19 Aug 2009 at 3:29