Closed LeMoussel closed 4 years ago
I tried your script with sample data and it works fine for me. I suspect the content-type is not what your code expect. You can find out by assigning it to your content just for testing:
var Contenttype = metadata.getString('document.contentType');
content += ' Contenttype: ' + Contenttype;
...
Strange .... it's not working for me.
Here's my config test: testConfigHttpCollector.xml.txt
Running under windows with testCollector-http.bat
: testCollector-http.bat.txt
in the JSON file under .\output-test\crawledFilesJSON
for the variable content
we have as a result:
[
{
"doc-add": {
"reference": "http://httpbin.org/forms/post",
"metadata": {
"collector.referrer-link-text": [
"HTML form"
],
"collector.referrer-reference": [
"http://httpbin.org/"
],
"collector.depth": [
"1"
]
},
"content": "Customer name: Telephone: E-mail address: Pizza Size Small Medium Large Pizza Toppings Bacon Extra Cheese Onion Mushroom Preferred delivery time: Delivery instructions: Submit order"
}
},
{
"doc-add": {
"reference": "http://httpbin.org/",
"metadata": {
"title": [
"httpbin.org"
],
"collector.depth": [
"0"
],
"collector.referenced-urls": [
"http://httpbin.org/forms/post"
]
},
"content": "httpbin.org 0.9.2 [ Base URL: httpbin.org/ ] A simple HTTP Request & Response Service. Run locally: $ docker run -p 80:80 kennethreitz/httpbin the developer - Website Send email to the developer [Powered by Flasgger] Other Utilities HTML form that posts to /post /forms/post"
}
}
]
According to the attached configuration file it should be :
"content": "Contenttype: [value of metadata document.contentType]Customer name: Telephone: E-mail .... "content": "Contenttype: [value of metadata document.contentType]httpbin.org 0.9.2 [ Base URL: httpbin.org/ ] .....
What am I missing? Thanks for your help!
Taggers cannot modify the content, only metadata. Modifying content is done using Transformers. This works:
<transformer class="com.norconex.importer.handler.transformer.impl.ScriptTransformer">
<script><![CDATA[
var Contenttype = metadata.getString('document.contentType');
content = 'Contenttype: [' + Contenttype + ']' + content;
]]></script>
</transformer>
Thank you!
I want to modify the content of the document with Javascript code. For this I use ScriptTagger and do this:
In the output file (I use JSONFileCommitter) the content value has not changed. '=> TEST: ' is not present in
content
.