eregs / regulations-core

An engine that supplies the API that allows users to read regulations and their various layers.
Creative Commons Zero v1.0 Universal
16 stars 27 forks source link

Elastic Search 'Amendments' Model Parsing Failure #59

Open eadamsatx opened 7 years ago

eadamsatx commented 7 years ago

When parsing 37 CFR 42 and core configured to use elastic search, every PUT to a notice URI fails the same way. Here are some snippets that don't make it to the console but provide a great deal of context, pulled from local variables in paused client.py post-exception:

Can't merge a non object mapping [amendments.changes] with an object mapping [amendments.changes]` [{'reason': '[YW_wNku][127.0.0.1:9300][indices:data/write/index[p]]', 'type': 'remote_transport_exception'}] '[YW_wNku][127.0.0.1:9300][indices:data/write/index[p]]'

As the request is made, regulations-core/regcore/db/es.py line 115 local variable notice has the following under the amendments key (ie. notice[‘amendments’]):

[

    {'authority': '35 U.S.C. 2(b)(2).', 'instruction': '1. The authority citation for 37 CFR part 1 continues to read as follows:', 'cfr_part': '1'},

    {'changes': [['1-301', [{'action': 'DELETE'}]]], 'instruction': '2. Section 1.301 is removed and reserved.', 'cfr_part': '1'},

    {'changes': [['1-302', [{'action': 'DELETE'}]]], 'instruction': '3. Section 1.302 is removed and reserved.', 'cfr_part': '1'},

    {'changes': [['1-303', [{'action': 'DELETE'}]]], 'instruction': '4. Section 1.303 is removed and reserved.', 'cfr_part': '1'},

    {'changes': [['1-304', [{'action': 'DELETE'}]]], 'instruction': '5. Section 1.304 is removed and reserved.', 'cfr_part': '1'},

    {'instruction': '6. Part 42 is added to read as follows:', 'cfr_part': '1'},
    {'instruction': '7. Part 90 is added to read as follows:', 'cfr_part': '90'}

]

With the debugger paused immediately after this failure, I attempted to pull what we already have there. There is no record:

$ curl 'http://localhost:9200/eregs/notice/2012-17900'
{"_index":"eregs","_type":"notice","_id":"2012-17900","found":false}

And pulling the schema didn't give me any hints about the preferred structure of amendments.

$ curl http://localhost:9200/eregs/_mapping/notice
{
  "eregs":{
    "mappings":{
      "notice":{
        "properties":{
          "cfr_parts":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "cfr_title":{
            "type":"long"
          },
          "dockets":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "document_number":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "effective_on":{
            "type":"date"
          },
          "footnotes":{
            "type":"object"
          },
          "fr_citation":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "fr_url":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "fr_volume":{
            "type":"long"
          },
          "meta":{
            "properties":{
              "start_page":{
                "type":"long"
              }
            }
          },
          "primary_agency":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "publication_date":{
            "type":"date"
          },
          "title":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "versions":{
            "properties":{
              "42":{
                "properties":{
                  "left":{
                    "type":"text",
                    "fields":{
                      "keyword":{
                        "type":"keyword",
                        "ignore_above":256
                      }
                    }
                  },
                  "right":{
                    "type":"text",
                    "fields":{
                      "keyword":{
                        "type":"keyword",
                        "ignore_above":256
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
cmc333333 commented 7 years ago

Hey again @eadamsatx. Sorry about that -- we don't have any active instances backed by Elastic, so we must have missed this when updating the Notice schema. I'd have to dig in more, but I'm betting the problem is that the amendments (a list) have different schemas (the first has an authority but no changes, for example). It probably makes sense to encode a single schema at parse time rather than tweaking these in the API.