-
Hi there,
I am trying to extract content from a file type very special, and I manage to convert it to HTML, however when i try to put al te information back from the output all information goes to…
-
Hi,
I've been messing around with the elasticsearch committer for a week or so, and for the life of me, I can't get the collector to commit to elasticsearch. There is no error when I run the collec…
-
Getting the following error when attempting to commit the items to the Azure index. Looks like it is maybe complaining about what is being put in the id field, but I don't know where this is coming f…
-
I am having issues isolating different crawlers to different types of documents so i can commit to elasticsearch. I want to be able to utilize the different for pdf, xml, html, images etc. What i wou…
-
I think I've stumbled upon a bug here. I'm attempting to use a .txt file as a sitemap of sorts. The file has one URL per line. It looks something like this:
```
https://wiki.mydomain.com/QC-Pro…
-
I'm experiencing issues while using the PhantomJSFetcher. Every odd run or so PhantomJS exits with value 137 and this seem to cause an NPE when trying to check for content-type.
`
ERROR SystemCom…
-
I need to crawl an intranet with a lot of file attachments. Many of them are Microsoft Office documents. Unfortunately they seem to be somewhat consistently served with the wrong Content-Type: `applic…
-
Hello,
I have downloaded the http collector and it works great with a core. I have a requirement to search more than one web site depending upon user's selection. I understand I need to create m…
-
Where do I find information on the fields that are available for the tagger. I.E. the fields that would go here:
```
id,title,keywords,description,content,document.reference, document.conte…
-
I'm getting a strange error when attempting to use `TextPatternTagger`. I'm hoping it's just something I'm doing wrong, but it seems like a strange one. My goal is to extract a thumbnail image for t…