Norconex / committer-elasticsearch

Implementation of Norconex Committer for Elasticsearch.
https://opensource.norconex.com/committers/elasticsearch/
Apache License 2.0
11 stars 6 forks source link

Reversed logic in the JSON field name matcher #33

Open paulcappadona opened 6 years ago

paulcappadona commented 6 years ago

Method below

    private void appendValue(StringBuilder json, String field, String value) {
        if (getJsonFieldsPattern() != null 
                && getJsonFieldsPattern().matches(field)) {
            json.append(value);
        } else {
            json.append('"')
                .append(StringEscapeUtils.escapeJson(value))
                .append("\"");
        }
    }

should have the json pattern match reversed

 private void appendValue(StringBuilder json, String field, String value) {
        if (getJsonFieldsPattern() != null 
                && field.matches(getJsonFieldsPattern())) {
            json.append(value);
        } else {
            json.append('"')
                .append(StringEscapeUtils.escapeJson(value))
                .append("\"");
        }
    }
kalhomoud commented 6 years ago

Hi @paulcappadona, Could you please elaborate? Can you give me an example?

Thanks!

paulcappadona commented 6 years ago

Hi @kalhomoud

The method in question should be matching document metadata field names against a regex pattern, and if matching return true so that the committer processes the data as a JSON object (not a string).

The following code illustrates that the match method should be issued against the field, not the regex pattern

    /**
     * Attempting to match any fields beginning with "obj-" so they are treated as JSON objects
     */
    public static void main(String[] args) {
        String pattern = "^" + "obj-" + ".*$";
        // Expected match result is true for the following
        testField("obj-crawl-meta", pattern);
        testField("obj-document-meta", pattern);
        // Expected no match, so false
        testField("somefield", pattern);
    }

    private static void testField(String field, String pattern) {
        // this is the logic in the ElasticSearchCommitter
        System.out.println("Matching (pattern.match(field)) pattern " + pattern + " against field " + field + " : Matched = " + pattern.matches(field));
        // this is the correct logic
        System.out.println("Matching (field.match(pattern)) field " + field + " against pattern " + pattern + " : Matched = " + field.matches(pattern));
    }

The output of this code is // Expected match (true) Matching (pattern.match(field)) pattern ^obj-.$ against field obj-crawl-meta : Matched = false Matching (field.match(pattern)) field obj-crawl-meta against pattern ^obj-.$ : Matched = true Matching (pattern.match(field)) pattern ^obj-.$ against field obj-document-meta : Matched = false Matching (field.match(pattern)) field obj-document-meta against pattern ^obj-.$ : Matched = true // Expected fail (false) Matching (pattern.match(field)) pattern ^obj-.$ against field somefield : Matched = false Matching (field.match(pattern)) field somefield against pattern ^obj-.$ : Matched = false

Regards Paul