apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.63k stars 1.03k forks source link

This document has errors that must be fixed beforeUsing HTMLDocument class . Gives the following error This document has errors that must be fixed before using HTML Tidy to generate a tidied up version. [LUCENE-1041] #2117

Closed asfimport closed 16 years ago

asfimport commented 16 years ago

Writing e-mail parser, and we are impeded by this error.

                    HtmlDocument hd = new HtmlDocument (p.getInputStream());
                    doc.add( new Field ( "contents", new StringReader(hd.getBody())) );

HTMLDocument: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/index.html?org/apache/lucene/ant/HtmlDocument.html

line 29 column 27 - Error: <st1:place> is not recognized!
line 29 column 47 - Error: <st1:country-region> is not recognized!
line 36 column 21 - Error: <o:p> is not recognized!
line 39 column 67 - Error: <o:p> is not recognized!
line 43 column 45 - Error: <o:p> is not recognized!
line 46 column 52 - Error: <o:p> is not recognized!
line 54 column 27 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 3 column 331 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 1 column 1,214 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 15 column 1 - Error: o:smarttagtype is not recognized!
line 17 column 1 - Error: o:smarttagtype is not recognized!
line 19 column 1 - Error: o:smarttagtype is not recognized!
line 21 column 1 - Error: o:smarttagtype is not recognized!
line 23 column 1 - Error: o:smarttagtype is not recognized!
line 111 column 48 - Error: <o:p> is not recognized!
line 111 column 196 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 1 column 1,444 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 1 column 1,384 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 662 column 11 - Error: <st1:city> is not recognized!
line 663 column 12 - Error: <st1:place> is not recognized!
line 682 column 91 - Error: <st1:personname> is not recognized!
line 686 column 87 - Error: <st1:place> is not recognized!
line 687 column 12 - Error: <st1:placename> is not recognized!
line 687 column 62 - Error: <st1:placetype> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 283 column 61 - Error: <o:p> is not recognized!
line 288 column 72 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 118 column 41 - Error: <o:p> is not recognized!
line 151 column 34 - Error: <o:p> is not recognized!
line 153 column 22 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 174 column 43 - Error: <o:p> is not recognized!
line 209 column 36 - Error: <o:p> is not recognized!
line 212 column 17 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 163 column 47 - Error: <o:p> is not recognized!
line 198 column 38 - Error: <o:p> is not recognized!
line 200 column 28 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 123 column 18 - Error: <font> missing '>' for end of tag
line 195 column 25 - Error: <font> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 123 column 18 - Error: <font> missing '>' for end of tag
line 195 column 25 - Error: <font> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 642 column 1 - Error: <sig> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 13 column 13 - Error: o:smarttagtype is not recognized!
line 15 column 1 - Error: o:smarttagtype is not recognized!
line 17 column 1 - Error: o:smarttagtype is not recognized!
line 19 column 1 - Error: o:smarttagtype is not recognized!
line 21 column 1 - Error: o:smarttagtype is not recognized!
line 23 column 1 - Error: o:smarttagtype is not recognized!
line 25 column 1 - Error: o:smarttagtype is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 41 column 6 - Error: <o:p> is not recognized!
line 49 column 47 - Error: <o:p> is not recognized!
line 202 column 52 - Error: <o:p> is not recognized!
line 204 column 109 - Error: <o:p> is not recognized!
line 212 column 44 - Error: <o:p> is not recognized!
line 217 column 47 - Error: <o:p> is not recognized!
line 222 column 69 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 22 column 13 - Error: <o:smarttagtype> is not recognized!
line 24 column 40 - Error: <o:smarttagtype> is not recognized!
line 26 column 31 - Error: <o:smarttagtype> is not recognized!
line 28 column 30 - Error: <o:smarttagtype> is not recognized!
line 95 column 32 - Error: <o:p> is not recognized!
line 99 column 32 - Error: <o:p> is not recognized!
line 105 column 2 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 13 column 13 - Error: o:smarttagtype is not recognized!
line 15 column 1 - Error: o:smarttagtype is not recognized!
line 17 column 1 - Error: o:smarttagtype is not recognized!
line 19 column 1 - Error: o:smarttagtype is not recognized!
line 21 column 1 - Error: o:smarttagtype is not recognized!
line 23 column 1 - Error: o:smarttagtype is not recognized!
line 89 column 70 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 18 column 22 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 91 column 26 - Error: <o:p> is not recognized!
line 94 column 16 - Error: <o:p> is not recognized!
line 97 column 81 - Error: <o:p> is not recognized!
line 100 column 16 - Error: <o:p> is not recognized!
line 105 column 51 - Error: <o:p> is not recognized!
line 108 column 16 - Error: <o:p> is not recognized!
line 111 column 84 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 101 column 26 - Error: <o:p> is not recognized!
line 104 column 16 - Error: <o:p> is not recognized!
line 108 column 64 - Error: <o:p> is not recognized!
line 111 column 16 - Error: <o:p> is not recognized!
line 116 column 37 - Error: <o:p> is not recognized!
line 119 column 16 - Error: <o:p> is not recognized!
line 124 column 89 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 126 column 8 - Error: <o:p> is not recognized!
line 128 column 78 - Error: <o:p> is not recognized!
line 132 column 47 - Error: <o:p> is not recognized!
line 134 column 78 - Error: <o:p> is not recognized!
line 139 column 14 - Error: <o:p> is not recognized!
line 141 column 78 - Error: <o:p> is not recognized!
line 146 column 55 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 107 column 79 - Error: <o:p> is not recognized!
line 110 column 16 - Error: <o:p> is not recognized!
line 113 column 27 - Error: <o:p> is not recognized!
line 116 column 16 - Error: <o:p> is not recognized!
line 120 column 16 - Error: <o:p> is not recognized!
line 125 column 66 - Error: <o:p> is not recognized!
line 133 column 23 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 141 column 54 - Error: <o:p> is not recognized!
line 143 column 78 - Error: <o:p> is not recognized!
line 145 column 89 - Error: <o:p> is not recognized!
line 147 column 78 - Error: <o:p> is not recognized!
line 151 column 41 - Error: <o:p> is not recognized!
line 155 column 44 - Error: <o:p> is not recognized!
line 160 column 24 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 110 column 26 - Error: <o:p> is not recognized!
line 113 column 16 - Error: <o:p> is not recognized!
line 117 column 35 - Error: <o:p> is not recognized!
line 120 column 16 - Error: <o:p> is not recognized!
line 123 column 27 - Error: <o:p> is not recognized!
line 126 column 16 - Error: <o:p> is not recognized!
line 130 column 16 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 114 column 102 - Error: <o:p> is not recognized!
line 117 column 16 - Error: <o:p> is not recognized!
line 120 column 23 - Error: <o:p> is not recognized!
line 123 column 16 - Error: <o:p> is not recognized!
line 137 column 53 - Error: <o:p> is not recognized!
line 143 column 20 - Error: <o:p> is not recognized!
line 146 column 26 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 110 column 26 - Error: <o:p> is not recognized!
line 113 column 16 - Error: <o:p> is not recognized!
line 118 column 19 - Error: <o:p> is not recognized!
line 121 column 16 - Error: <o:p> is not recognized!
line 124 column 27 - Error: <o:p> is not recognized!
line 127 column 16 - Error: <o:p> is not recognized!
line 131 column 16 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 46 column 70 - Error: <o:p> is not recognized!
line 48 column 70 - Error: <o:p> is not recognized!
line 53 column 37 - Error: <o:p> is not recognized!
line 55 column 24 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 118 column 71 - Error: <o:p> is not recognized!
line 121 column 16 - Error: <o:p> is not recognized!
line 126 column 63 - Error: <o:p> is not recognized!
line 129 column 16 - Error: <o:p> is not recognized!
line 132 column 23 - Error: <o:p> is not recognized!
line 135 column 16 - Error: <o:p> is not recognized!
line 150 column 53 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 6 column 1 - Error: o:smarttagtype is not recognized!
line 8 column 1 - Error: o:smarttagtype is not recognized!
line 44 column 34 - Error: <o:p> is not recognized!
line 46 column 20 - Error: <o:p> is not recognized!
line 48 column 66 - Error: <o:p> is not recognized!
line 50 column 20 - Error: <o:p> is not recognized!
line 53 column 51 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

OUCH decoding attachment, p=com.sun.mail.imap.IMAPBodyPart@1bcdbf6 ioe=java.io.UnsupportedEncodingException: X-UNKNOWN
java.io.UnsupportedEncodingException: X-UNKNOWN
        at sun.io.Converters.getConverterClass(Converters.java:218)
        at sun.io.Converters.newConverter(Converters.java:251)
        at sun.io.ByteToCharConverter.getConverter(ByteToCharConverter.java:68)
        at sun.nio.cs.StreamDecoder$ConverterSD.<init>(StreamDecoder.java:224)
        at sun.nio.cs.StreamDecoder$ConverterSD.<init>(StreamDecoder.java:210)
        at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:77)
        at java.io.InputStreamReader.<init>(InputStreamReader.java:83)
        at com.sun.mail.handlers.text_plain.getContent(text_plain.java:95)
        at javax.activation.DataSourceDataContentHandler.getContent(DataHandler.java:803)
        at javax.activation.DataHandler.getContent(DataHandler.java:550)
        at javax.mail.internet.MimeBodyPart.getContent(MimeBodyPart.java:652)
        at emailanalyzer.EmailParser.indexContent(EmailParser.java:506)
        at emailanalyzer.EmailParser.index(EmailParser.java:448)
        at emailanalyzer.EmailParser.indexContent(EmailParser.java:521)
        at emailanalyzer.EmailParser.index(EmailParser.java:430)
        at emailanalyzer.EmailParser.index(EmailParser.java:316)
        at emailanalyzer.EmailParser.index(EmailParser.java:373)
        at emailanalyzer.EmailParser.traverse(EmailParser.java:342)
        at emailanalyzer.EmailParser.main(EmailParser.java:196)
line 121 column 26 - Error: <o:p> is not recognized!
line 124 column 16 - Error: <o:p> is not recognized!
line 128 column 65 - Error: <o:p> is not recognized!
line 131 column 16 - Error: <o:p> is not recognized!
line 134 column 27 - Error: <o:p> is not recognized!
line 137 column 16 - Error: <o:p> is not recognized!
line 141 column 16 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 128 column 53 - Error: <o:p> is not recognized!
line 131 column 16 - Error: <o:p> is not recognized!
line 134 column 23 - Error: <o:p> is not recognized!
line 137 column 16 - Error: <o:p> is not recognized!
line 151 column 53 - Error: <o:p> is not recognized!
line 157 column 20 - Error: <o:p> is not recognized!
line 159 column 63 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 125 column 71 - Error: <o:p> is not recognized!
line 128 column 16 - Error: <o:p> is not recognized!
line 131 column 23 - Error: <o:p> is not recognized!
line 134 column 16 - Error: <o:p> is not recognized!
line 148 column 53 - Error: <o:p> is not recognized!
line 154 column 20 - Error: <o:p> is not recognized!
line 156 column 63 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 22 column 13 - Error: <o:smarttagtype> is not recognized!
line 24 column 40 - Error: <o:smarttagtype> is not recognized!
line 26 column 31 - Error: <o:smarttagtype> is not recognized!
line 134 column 71 - Error: <o:p> is not recognized!
line 151 column 52 - Error: <o:p> is not recognized!
line 202 column 2 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 140 column 26 - Error: <o:p> is not recognized!
line 143 column 16 - Error: <o:p> is not recognized!
line 154 column 34 - Error: <o:p> is not recognized!
line 157 column 16 - Error: <o:p> is not recognized!
line 162 column 59 - Error: <o:p> is not recognized!
line 165 column 16 - Error: <o:p> is not recognized!
line 168 column 27 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 2 column 683 - Error: <srilatha_potnuru> is not recognized!
line 2 column 733 - Error: <vakula16> is not recognized!
line 2 column 803 - Error: <pvenu> is not recognized!
line 2 column 839 - Error: <srini_avant> is not recognized!
line 2 column 919 - Error: <rajmirk> is not recognized!
line 2 column 966 - Error: <w52970> is not recognized!
line 3 column 101 - Error: <w5297c> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 58 column 21 - Error: <srilatha_potnuru> is not recognized!
line 59 column 4 - Error: <vakula16> is not recognized!
line 60 column 4 - Error: <pvenu> is not recognized!
line 60 column 40 - Error: <srini_avant> is not recognized!
line 61 column 58 - Error: <rajmirk> is not recognized!
line 62 column 28 - Error: <w52970> is not recognized!
line 64 column 28 - Error: <w5297c> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 534 column 49 - Error: <o:p> is not recognized!
line 535 column 71 - Error: <o:p> is not recognized!
line 536 column 69 - Error: <o:p> is not recognized!
line 537 column 79 - Error: <o:p> is not recognized!
line 538 column 119 - Error: <o:p> is not recognized!
line 539 column 112 - Error: <o:p> is not recognized!
line 540 column 69 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 166 column 8 - Error: <o:p> is not recognized!
line 168 column 109 - Error: <o:p> is not recognized!
line 183 column 34 - Error: <o:p> is not recognized!
line 185 column 109 - Error: <o:p> is not recognized!
line 191 column 59 - Error: <o:p> is not recognized!
line 193 column 109 - Error: <o:p> is not recognized!
line 195 column 120 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 568 column 48 - Error: <o:p> is not recognized!
line 569 column 90 - Error: <o:p> is not recognized!
line 570 column 88 - Error: <o:p> is not recognized!
line 571 column 98 - Error: <o:p> is not recognized!
line 572 column 138 - Error: <o:p> is not recognized!
line 573 column 131 - Error: <o:p> is not recognized!
line 574 column 88 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 164 column 10 - Error: <o:p> is not recognized!
line 166 column 87 - Error: <o:p> is not recognized!
line 177 column 36 - Error: <o:p> is not recognized!
line 179 column 87 - Error: <o:p> is not recognized!
line 184 column 32 - Error: <o:p> is not recognized!
line 186 column 87 - Error: <o:p> is not recognized!
line 188 column 98 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 218 column 8 - Error: <o:p> is not recognized!
line 220 column 89 - Error: <o:p> is not recognized!
line 232 column 34 - Error: <o:p> is not recognized!
line 234 column 89 - Error: <o:p> is not recognized!
line 239 column 48 - Error: <o:p> is not recognized!
line 241 column 89 - Error: <o:p> is not recognized!
line 243 column 100 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 214 column 12 - Error: <o:p> is not recognized!
line 216 column 89 - Error: <o:p> is not recognized!
line 227 column 38 - Error: <o:p> is not recognized!
line 229 column 89 - Error: <o:p> is not recognized!
line 234 column 39 - Error: <o:p> is not recognized!
line 236 column 89 - Error: <o:p> is not recognized!
line 238 column 100 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 239 column 14 - Error: <o:p> is not recognized!
line 241 column 91 - Error: <o:p> is not recognized!
line 253 column 40 - Error: <o:p> is not recognized!
line 255 column 91 - Error: <o:p> is not recognized!
line 260 column 54 - Error: <o:p> is not recognized!
line 262 column 91 - Error: <o:p> is not recognized!
line 264 column 102 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 277 column 16 - Error: <o:p> is not recognized!
line 279 column 93 - Error: <o:p> is not recognized!
line 291 column 42 - Error: <o:p> is not recognized!
line 293 column 93 - Error: <o:p> is not recognized!
line 298 column 56 - Error: <o:p> is not recognized!
line 300 column 93 - Error: <o:p> is not recognized!
line 302 column 104 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 309 column 18 - Error: <o:p> is not recognized!
line 311 column 95 - Error: <o:p> is not recognized!
line 323 column 44 - Error: <o:p> is not recognized!
line 325 column 95 - Error: <o:p> is not recognized!
line 330 column 58 - Error: <o:p> is not recognized!
line 332 column 95 - Error: <o:p> is not recognized!
line 334 column 106 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 359 column 20 - Error: <o:p> is not recognized!
line 361 column 97 - Error: <o:p> is not recognized!
line 373 column 46 - Error: <o:p> is not recognized!
line 375 column 97 - Error: <o:p> is not recognized!
line 381 column 18 - Error: <o:p> is not recognized!
line 383 column 97 - Error: <o:p> is not recognized!
line 385 column 108 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 388 column 20 - Error: <o:p> is not recognized!
line 390 column 97 - Error: <o:p> is not recognized!
line 402 column 46 - Error: <o:p> is not recognized!
line 404 column 97 - Error: <o:p> is not recognized!
line 410 column 18 - Error: <o:p> is not recognized!
line 412 column 97 - Error: <o:p> is not recognized!
line 414 column 108 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 471 column 8 - Error: <o:p> is not recognized!
line 473 column 89 - Error: <o:p> is not recognized!
line 485 column 34 - Error: <o:p> is not recognized!
line 487 column 89 - Error: <o:p> is not recognized!
line 492 column 48 - Error: <o:p> is not recognized!
line 494 column 89 - Error: <o:p> is not recognized!
line 496 column 100 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 26 column 54 - Error: <st1:place> is not recognized!
line 27 column 17 - Error: <st1:country-region> is not recognized!
line 30 column 9 - Error: <o:p> is not recognized!
line 36 column 12 - Error: <o:p> is not recognized!
line 42 column 23 - Error: <o:p> is not recognized!
line 47 column 61 - Error: <o:p> is not recognized!
line 456 column 22 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 133 column 14 - Error: <z> is not recognized!
line 209 column 7 - Error: <z> is not recognized!
line 209 column 10 - Error: <y> is not recognized!
line 260 column 29 - Error: <x> is not recognized!
line 269 column 29 - Error: <x> is not recognized!
line 294 column 7 - Error: <y> is not recognized!
line 376 column 15 - Error: <z> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 57 column 56 - Error: <nowrap> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 6 column 68 - Error: <im> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 6 column 68 - Error: <im> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 77 column 25 - Error: <o:p> is not recognized!
line 80 column 16 - Error: <o:p> is not recognized!
line 83 column 63 - Error: <o:p> is not recognized!
line 86 column 16 - Error: <o:p> is not recognized!
line 89 column 65 - Error: <o:p> is not recognized!
line 92 column 16 - Error: <o:p> is not recognized!
line 96 column 77 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 1 column 71 - Error: amar49119 is not recognized!
line 1 column 103 - Error: durra.tirunagari is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 1 column 1,393 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 6 column 1 - Error: o:smarttagtype is not recognized!
line 8 column 1 - Error: o:smarttagtype is not recognized!
line 10 column 1 - Error: o:smarttagtype is not recognized!
line 51 column 18 - Error: <o:p> is not recognized!
line 54 column 9 - Error: <o:p> is not recognized!
line 59 column 35 - Error: <o:p> is not recognized!
line 62 column 9 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 1 column 1,369 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 2 column 317 - Error: <srilatha_potnuru> is not recognized!
line 2 column 363 - Error: <vakula16> is not recognized!
line 2 column 613 - Error: <rajmirk> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 54 column 30 - Error: <o:p> is not recognized!
line 56 column 20 - Error: <o:p> is not recognized!
line 61 column 47 - Error: <o:p> is not recognized!
line 63 column 20 - Error: <o:p> is not recognized!
line 66 column 22 - Error: <o:p> is not recognized!
line 68 column 20 - Error: <o:p> is not recognized!
line 71 column 13 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 1 column 1,442 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 78 column 50 - Error: <o:p> is not recognized!
line 81 column 16 - Error: <o:p> is not recognized!
line 84 column 43 - Error: <o:p> is not recognized!
line 87 column 16 - Error: <o:p> is not recognized!
line 90 column 52 - Error: <o:p> is not recognized!
line 93 column 16 - Error: <o:p> is not recognized!
line 95 column 39 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 85 column 42 - Error: <o:p> is not recognized!
line 93 column 41 - Error: <o:p> is not recognized!
line 98 column 8 - Error: <o:p> is not recognized!
line 103 column 66 - Error: <o:p> is not recognized!
line 107 column 46 - Error: <o:p> is not recognized!
line 112 column 38 - Error: <o:p> is not recognized!
line 117 column 35 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 5 column 119 - Error: <o:p> is not recognized!
line 5 column 412 - Error: <o:p> is not recognized!
line 5 column 648 - Error: <o:p> is not recognized!
line 5 column 948 - Error: <o:p> is not recognized!
line 6 column 179 - Error: <o:p> is not recognized!
line 6 column 475 - Error: <o:p> is not recognized!
line 6 column 748 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 5 column 464 - Error: <o:p> is not recognized!
line 5 column 765 - Error: <o:p> is not recognized!
line 6 column 21 - Error: <o:p> is not recognized!
line 6 column 328 - Error: <o:p> is not recognized!
line 6 column 535 - Error: <o:p> is not recognized!
line 6 column 838 - Error: <o:p> is not recognized!
line 7 column 146 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 145 column 61 - Error: <o:p> is not recognized!
line 154 column 14 - Error: <o:p> is not recognized!
line 159 column 12 - Error: <o:p> is not recognized!
line 165 column 26 - Error: <o:p> is not recognized!
line 170 column 10 - Error: <o:p> is not recognized!
line 175 column 49 - Error: <o:p> is not recognized!
line 180 column 57 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 6 column 854 - Error: <o:p> is not recognized!
line 7 column 200 - Error: <o:p> is not recognized!
line 7 column 474 - Error: <o:p> is not recognized!
line 7 column 822 - Error: <o:p> is not recognized!
line 8 column 55 - Error: <o:p> is not recognized!
line 8 column 399 - Error: <o:p> is not recognized!
line 8 column 711 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 1 column 709 - Error: <srilatha_potnuru> is not recognized!
line 1 column 775 - Error: <kalyanivalluri1> is not recognized!
line 1 column 831 - Error: <rajmirk> is not recognized!
line 2 column 453 - Error: <kalyanivalluri1> is not recognized!
line 2 column 648 - Error: <rajmirk> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 54 column 20 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 103 column 60 - Error: <o:p> is not recognized!
line 105 column 48 - Error: <o:p> is not recognized!
line 107 column 64 - Error: <o:p> is not recognized!
line 109 column 68 - Error: <o:p> is not recognized!
line 112 column 6 - Error: <o:p> is not recognized!
line 114 column 72 - Error: <o:p> is not recognized!
line 116 column 70 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

OUCH decoding attachment, p=com.sun.mail.imap.IMAPMessage@de1b8a ioe=java.io.UnsupportedEncodingException: X-UNKNOWN
java.io.UnsupportedEncodingException: X-UNKNOWN
        at sun.io.Converters.getConverterClass(Converters.java:218)
        at sun.io.Converters.newConverter(Converters.java:251)
        at sun.io.ByteToCharConverter.getConverter(ByteToCharConverter.java:68)
        at sun.nio.cs.StreamDecoder$ConverterSD.<init>(StreamDecoder.java:224)
        at sun.nio.cs.StreamDecoder$ConverterSD.<init>(StreamDecoder.java:210)
        at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:77)
        at java.io.InputStreamReader.<init>(InputStreamReader.java:83)
        at com.sun.mail.handlers.text_plain.getContent(text_plain.java:95)
        at javax.activation.DataSourceDataContentHandler.getContent(DataHandler.java:803)
        at javax.activation.DataHandler.getContent(DataHandler.java:550)
        at javax.mail.internet.MimeMessage.getContent(MimeMessage.java:1398)
        at emailanalyzer.EmailParser.indexContent(EmailParser.java:506)
        at emailanalyzer.EmailParser.index(EmailParser.java:430)
        at emailanalyzer.EmailParser.index(EmailParser.java:316)
        at emailanalyzer.EmailParser.index(EmailParser.java:373)
        at emailanalyzer.EmailParser.traverse(EmailParser.java:342)
        at emailanalyzer.EmailParser.main(EmailParser.java:196)
OUCH decoding attachment, p=com.sun.mail.imap.IMAPMessage@1e232b5 ioe=java.io.UnsupportedEncodingException: X-UNKNOWN
java.io.UnsupportedEncodingException: X-UNKNOWN
        at sun.io.Converters.getConverterClass(Converters.java:218)
        at sun.io.Converters.newConverter(Converters.java:251)
        at sun.io.ByteToCharConverter.getConverter(ByteToCharConverter.java:68)
        at sun.nio.cs.StreamDecoder$ConverterSD.<init>(StreamDecoder.java:224)
        at sun.nio.cs.StreamDecoder$ConverterSD.<init>(StreamDecoder.java:210)
        at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:77)
        at java.io.InputStreamReader.<init>(InputStreamReader.java:83)
        at com.sun.mail.handlers.text_plain.getContent(text_plain.java:95)
        at javax.activation.DataSourceDataContentHandler.getContent(DataHandler.java:803)
        at javax.activation.DataHandler.getContent(DataHandler.java:550)
        at javax.mail.internet.MimeMessage.getContent(MimeMessage.java:1398)
        at emailanalyzer.EmailParser.indexContent(EmailParser.java:506)
        at emailanalyzer.EmailParser.index(EmailParser.java:430)
        at emailanalyzer.EmailParser.index(EmailParser.java:316)
        at emailanalyzer.EmailParser.index(EmailParser.java:373)
        at emailanalyzer.EmailParser.traverse(EmailParser.java:342)
        at emailanalyzer.EmailParser.main(EmailParser.java:196)
line 17 column 61 - Error: <o:p> is not recognized!
line 37 column 31 - Error: <o:p> is not recognized!
line 39 column 41 - Error: <o:p> is not recognized!
line 42 column 34 - Error: <o:p> is not recognized!
line 45 column 35 - Error: <o:p> is not recognized!
line 48 column 10 - Error: <o:p> is not recognized!
line 51 column 10 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 143 column 26 - Error: <o:p> is not recognized!
line 146 column 16 - Error: <o:p> is not recognized!
line 152 column 148 - Error: <o:p> is not recognized!
line 159 column 53 - Error: <o:p> is not recognized!
line 162 column 27 - Error: <o:p> is not recognized!
line 165 column 16 - Error: <o:p> is not recognized!
line 169 column 16 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 13 column 13 - Error: o:smarttagtype is not recognized!
line 15 column 1 - Error: o:smarttagtype is not recognized!
line 17 column 1 - Error: o:smarttagtype is not recognized!
line 19 column 1 - Error: o:smarttagtype is not recognized!
line 78 column 47 - Error: <o:p> is not recognized!
line 82 column 17 - Error: <o:p> is not recognized!
line 85 column 45 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 38 column 42 - Error: <o:p> is not recognized!
line 45 column 8 - Error: <o:p> is not recognized!
line 53 column 68 - Error: <o:p> is not recognized!
line 60 column 44 - Error: <o:p> is not recognized!
line 70 column 42 - Error: <o:p> is not recognized!
line 77 column 8 - Error: <o:p> is not recognized!
line 86 column 42 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 54 column 20 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 1 column 1,367 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 179 column 37 - Error: <st1:city> is not recognized!
line 179 column 56 - Error: <st1:place> is not recognized!
line 182 column 30 - Error: <st1:city> is not recognized!
line 182 column 50 - Error: <st1:place> is not recognized!
line 185 column 30 - Error: <st1:city> is not recognized!
line 185 column 50 - Error: <st1:place> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 177 column 39 - Error: <st1:city> is not recognized!
line 177 column 58 - Error: <st1:place> is not recognized!
line 181 column 32 - Error: <st1:city> is not recognized!
line 181 column 52 - Error: <st1:place> is not recognized!
line 186 column 32 - Error: <st1:city> is not recognized!
line 186 column 52 - Error: <st1:place> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 25 column 60 - Error: <o:p> is not recognized!
line 32 column 37 - Error: <o:p> is not recognized!
line 34 column 64 - Error: <o:p> is not recognized!
line 43 column 56 - Error: <o:p> is not recognized!
line 45 column 64 - Error: <o:p> is not recognized!
line 47 column 34 - Error: <o:p> is not recognized!
line 49 column 34 - Error: <o:p> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 541 column 22 - Error: <img> missing '>' for end of tag
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

line 509 column 43 - Error: <color> is not recognized!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.

Migrated from LUCENE-1041 by DURGA DEEP, resolved Nov 01 2007 Environment:

Solaris 10. http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/ant/src/java/org/apache/lucene/ant/
asfimport commented 16 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

There does not appear to be a bug here.

As the javadocs for this class state...

The HtmlDocument class creates a Lucene Document from an HTML document.

It does this by using JTidy package.

JTidy is then complaining about errors in your HTML document ... notably that it doesn't seem to be valid html.