Closed GerbenKD closed 3 years ago
Fields can be multivalued but unless I am mistaken, if a single value is present then a String is returned. Any reason why you'd have more than one value for that field?
Hi Julien. I posted an issue too soon, likely this is because of how we serialize the Metadata to elasticsearch. I see this is different in your implementation of the StatusUpdaterBolt. My apologies.
cheers,
Gerben
Hmm actually, also when using the SDK StatusUpdaterBolt I see single values as arrays in ES.
cheers,
Gerben
ok thanks, will have a look
I've just added a test to try to reproduce the problem and it's not happening. I really don't think single values are returned by ES as lists. @GerbenKD could you please double check and try to reproduce the problem? Thanks
Hi Julien,
Attached a screenshot of the debugger for the HybridSpout. You see that the hostname field is an ArrayList of size 1. In ES this is also stored as a List with a single element. Could this be an ES difference? We are using 7.5.2.
Tested with 7.5.2, no difference. It would have been too big a change for a minor release. What's your mapping for the status index? How did you create it?
Status index is created by the SDK. I wonder whether the issue is this line though:
Only for metadata.key
a non-array value is added to the metadata builder object. All other metadata values are String[]
. I tried to use another metadata value in HybridSpout
(as in the other issue I created). Hence I get the exception.
Gerben
Managed to reproduce it in the test, thanks. Can you please try the fix in ea6d3fb?
Works! Thanks!
Gerben
great. thanks for reporting it @GerbenKD
When using the HybridSpout I get cannot cast ArrayList to String exceptions when using a field from metadata as the key.
The error is in the following line:
https://github.com/DigitalPebble/storm-crawler/blob/eb735b4f704e86ccdf004ca4dee107f331fad2a4/external/elasticsearch/src/main/java/com/digitalpebble/stormcrawler/elasticsearch/persistence/HybridSpout.java#L178
Since metadata field values are always lists, this error seems to make sense.