amir-jakoby / crawler-commons

Automatically exported from code.google.com/p/crawler-commons
0 stars 0 forks source link

[Sitemaps] SiteMapParser Tika detection doesn't work well on all cases #46

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
When using the parse method which gets only a sitemap URL, we use Tika to 
detect the Mime type.

On some cases, the detection is bad.

We need to use a better Tika detection.

Use:
new Tika().detect(URL)

Instead of the current:
new Tika().detect(bytes)

Original issue reported on code.google.com by avrah...@gmail.com on 12 Jul 2014 at 8:29

GoogleCodeExporter commented 8 years ago
Please delete this Issue (46) as it is a duplicate of issue 47

Original comment by avrah...@gmail.com on 12 Jul 2014 at 8:33

GoogleCodeExporter commented 8 years ago

Original comment by lewis.mc...@gmail.com on 13 Jul 2014 at 11:46

GoogleCodeExporter commented 8 years ago
Thanks Luis.

I suppose I could also have done that.

Takes me some time to realize the functionality of the google code site

Original comment by avrah...@gmail.com on 13 Jul 2014 at 12:51

GoogleCodeExporter commented 8 years ago
Yeah it is kind of archaic.
Don't worry. You are doing great things for CC and I personally thank you for 
that.
We are nearing a release I feel.

Original comment by lewis.mc...@gmail.com on 13 Jul 2014 at 12:54