And I try to identify "101", both will succeed. We display both successful results to the user, but only actually record the first successful result when creating the "identifier" doc in the ES index.
The structure of the "identifier" doc could be changed - instead of a dictionary with the properties of the first successful test like "name", it could be a dictionary with 3 keys: identifier (the string the user tried to identify), id (unique id, same as _id on the top level of the ES document), what.
"what" would be a list of possibilities - this is what this identifier could be. So if we have 1 successful test during identification, "what" would only have 1 element - that would be a dictionary with "name", "url_prefix", "url_suffix", "tags", "regex" and so on. However, if multiple tests succeed in identifying the string, the list would have more elements, and each one of them would be a dictionary with "name" and so on.
"what" would basically be a list of what the identifier string could be.
Right now we're losing information (successful identifications when multiple tests succeed), so even if the solution described is not perfect, it will at least preserve all information from the identification process.
If I have two tests with the following regexes:
([0-2]+)
([0-9]+)
And I try to identify "101", both will succeed. We display both successful results to the user, but only actually record the first successful result when creating the "identifier" doc in the ES index.
The structure of the "identifier" doc could be changed - instead of a dictionary with the properties of the first successful test like "name", it could be a dictionary with 3 keys: identifier (the string the user tried to identify), id (unique id, same as _id on the top level of the ES document), what.
"what" would be a list of possibilities - this is what this identifier could be. So if we have 1 successful test during identification, "what" would only have 1 element - that would be a dictionary with "name", "url_prefix", "url_suffix", "tags", "regex" and so on. However, if multiple tests succeed in identifying the string, the list would have more elements, and each one of them would be a dictionary with "name" and so on.
"what" would basically be a list of what the identifier string could be.
Right now we're losing information (successful identifications when multiple tests succeed), so even if the solution described is not perfect, it will at least preserve all information from the identification process.