Closed GoogleCodeExporter closed 8 years ago
Will submit the patch after submission of issue40. (same file touched)
Original comment by avrah...@gmail.com
on 12 Jul 2014 at 8:30
Scenario where the bug can be reproduced:
Run the SitemapParser Tool on the following URL:
http://www.amazon.com/sitemap_video.xml
Original comment by avrah...@gmail.com
on 12 Jul 2014 at 8:32
[deleted comment]
New parse(Url url) method introduced on issue39
Using the above method in SitemapTool: issue43 (not yet committed to svn)
Original comment by avrah...@gmail.com
on 14 Jul 2014 at 1:21
new Tika().detect(URL) -- Will solve the mentioned problem.
BUT it will cause out library to fetch the sitemap twice.
A better solution should be sought.
Maybe use new Tika().detect(bytes, filename);
Original comment by avrah...@gmail.com
on 16 Jul 2014 at 5:24
Original comment by avrah...@gmail.com
on 18 Jul 2014 at 8:05
Let's instanciate the Tika instance only once and reuse it - otherwise we have
to reload the Tika config everytime which is definitely not needed. (Julien)
Original comment by avrah...@gmail.com
on 1 Aug 2014 at 3:51
I will begin working on this one
Original comment by avrah...@gmail.com
on 6 Aug 2014 at 7:10
Attached is a patch with the required optimization.
Now the Tika detection is being called with the byte array + filename
The Tika object is being instantiated only once
Original comment by avrah...@gmail.com
on 18 Aug 2014 at 8:10
Attachments:
+ 1 ship it Thanks
Original comment by lewis.mc...@gmail.com
on 18 Aug 2014 at 9:22
Shipped in revision: r134
Original comment by avrah...@gmail.com
on 19 Aug 2014 at 7:10
Original issue reported on code.google.com by
avrah...@gmail.com
on 12 Jul 2014 at 8:29