ATLANTBH / nutch-plugins

Apache Nutch extensions
36 stars 31 forks source link

PropertyField's not separated properly when indexing #8

Open Drewch opened 11 years ago

Drewch commented 11 years ago

There is a bug in XPathIndexingFilter.java which is seen when you create two properties each having a propertyfield with the same name. This is a totally valid case, and is shown in the following example: https://gist.github.com/Drewch/6392261. It is simplified in that other fields are removed that would differ in the two properties.

The current implementation if XPathIndexingFilter.java goes through each Property in the list and if a field name matches what's in metadata, then it inserts it. Therefore, in this case, the field name is in both properties, which are both hit, and therefore, the same metadata is put into solr twice.

I will submit a pull request to fix this.