chrfrantz / IG-Parser

Parser for IG 2.0 Statements encoded in IG Script Notation
GNU General Public License v3.0
7 stars 3 forks source link
ig-parser ig-script institutional-analysis institutional-grammar institutional-grammar-2-0 institutional-statement

IG Parser

Parser for IG 2.0 Statements based on the IG Script Notation.

Contact: Christopher Frantz (christopher.frantz@ntnu.no)

Institutional Grammar 2.0 Website: https://newinstitutionalgrammar.org

Deployed IG Parser:

Note: Either version allows interactive switching to the respective other while preserving encoded statement information.

See Revision history for a detailed overview of changes.

See Contributors for an overview of contributions to the project. We explicitly encourage external contributions. Please feel free to get in touch if you plan to contribute to the repository. Please create an issue to report bugs, or to propose features (alternatively also per mail).

Overview

IG Parser is a parser for IG Script, a formal notation for the representation institutional statements (e.g., policy statements) used in the Institutional Grammar 2.0. The parser can be used locally, as well as via a web interface that produces tabular output of parsed statements (currently supporting Google Sheets format). In the following, you will find a brief introduction to the user interface, followed by a comprehensive introduction to the syntactic principles and essential features of IG Script. This includes a set of examples showcasing all features and various levels of complexity, while highlighting typical mistakes in the encoding. As a final aspect, the deployment instructions for IG Parser are provided.

The conceptual background of the Institutional Grammar 2.0 is provided in the corresponding article and book, augmented with supplementary operational coding guidelines.

User Interface Guide

The user interface consists of various entry fields, followed by parameters (specific to each version of the parser) and an output section that will contain the generated output. The following subsections highlight the key features of each element.

General Entry Fields

The initial entry field holds the 'Original Statement'. The purpose of this field is to keep track of the original statement during coding, but also to include it in the output if you choose to do so (see 'Parameters' section in the UI; discussed below). Prior to coding, it also allows you to vet the statement for obvious challenges in coding (e.g., imbalanced parentheses, non-supported symbol), which you can do by clicking on 'Validate 'Original Statement' input'. This will allow you copy the content to the 'Encoded Statement' field (it warns you if encoded code exists to prevent accidental overwriting of the existing coding).

The 'Encoded Statement' area provides the actual coding editor. It features both an 'Advanced Mode' that includes additional input options (see below) as well as color-coding of the encoded statement. Alternatively, you can use the standard mode that does not provide any advanced features beyond basic bracket matching, and operates by directly encoding in the IG Script syntax (described later). This version is more useful if using mobile devices or facing accessibility issues based on the color-coding.

Advanced Editor

The advanced user interface has four main input options available to annotate statements:

In addition to these features there are toggles for nesting, which enables the correct brackets for nested symbols, semantic annotation which adds the semantic annotation brackets to symbols on creation. There are also buttons for undo, redo and for displaying a quick guide of the supported symbols and keybinds in the editor.

Another feature of the website is keyboard interaction. Where each element can be navigated using the "tab" key or "shift+tab" to navigate in reverse. Additionally, each button can be pressed using the "enter" key and the parameter toggles can be clicked using the "space" key. Further for selecting or highlighting text in the editor the combination of "ctrl+shift" and the arrow keys can be used. This makes the website accessible using a keyboard as the only input.

Toggling between variants (editors, parser versions)

You can interactively toggle between basic and the advanced mode by clicking on 'Toggle advanced editor features'. You can further interactively switch between the tabular output mode as well as the visual output mode of the parser. In all those cases no coding information is lost.

Output-specific parameters

Below the editor area you will find output-specific parameters, all of which have a simple help (by hovering over their label; some allow for opening of external pages for more extensive guidance).

Tabular output

By clicking on 'Generate tabular output', the input is parsed and output generated, which can be copied into the clipboard for transferral into a tool of your choice.

Visual output

By clicking on 'Generate visual output', the input information is parsed and the statement tree structure displayed.

Usage considerations

In the following, you will find an overview of the actual IG Script syntax used for the encoding of institutional statements entered into the 'Encoded Statement' field.

IG Script

IG Script is a notation introduced in the context of the Institutional Grammar 2.0 (IG 2.0) that aims at a deep structural representation of legal statements alongside selected levels of expressiveness. While IG 2.0 highlights the conceptual background, the objective of IG Script is to provide an accessible, but formal approach to provide a format-independent representation of institutional statements of any type (e.g., regulative, constitutive, hybrid). While the parser currently supports exemplary export formats (e.g., tabular format and visual output), the tool is open to be extended to support other output formats (e.g., XML, JSON, YAML). The introduction below focuses on the operational coding. Syntactic and semantic foundations are provided elsewhere.

Principles of IG Script Syntax

IG Script centers around a set of fundamental primitives that can be combined to parse statements comprehensively, including:

Component Coding

Component Coding provides the basic building block for any statement encoding. A component is represented as

componentSymbol is one of the supported symbols for the different component types (see below). An example for an annotated Attributes component is A(Farmer).

naturalLanguageText is the human-readable text annotated as the entity the annotation describes. The text is open-ended and can include special symbols, including parentheses (e.g., A(Farmer (e.g., organic farmer))). Exceptions to this rule are discussed in the context of combinations).

The scope of a component is specified by opening and closing parentheses.

All components of a statement are annotated correspondingly, without concern for order, or repetition. The parser further tolerates multiple component annotations of the same kind. Multiple Attributes, for example (e.g., A(Farmer) D(must) I(comply) A(Certifier)), are effectively interpreted as a combination (i.e., A(Farmer [AND] Certifier) D(must) I(comply)) in the parsing process.

Any symbols outside the encoded components and combinations of components are ignored by the parser.

The parser supports a fixed set of component type symbols that uniquely identify a given component type.

Supported Component Type Symbols:

Component Combinations

Statements often express alternatives or combinations of activities that need to be administered, thus reflecting logical combinations of entities, such as actors, actions, conditions, etc. To combine individual components logically, IG Script supports the notion of Component Combinations. As indicated above, separate components of the same type are interpreted as AND-combined. To explicitly specify the nature of the logical relationship (e.g., conjunction, inclusive/exclusive disjunction), combinations need to be explicitly specified in the following format:

componentSymbol(componentValue1 [logicalOperator] componentValue2)

Note: Per default, combinations are scoped to cover both component values (i.e., componentValue1 and componentValue2). In practice, it may be relevant to scope combinations more narrowly within a statement to capture the statement semantics, e.g., A((production [AND] handling) responsible) reflecting production responsible and handling responsible.

Examples:

Component combinations can be nested arbitrarily deep, i.e., any component can be a combination itself, for example:

componentSymbol(componentValue1 [logicalOperator] (componentValue2 [logicalOperator] componentValue3))

Example:

Where components are linked by the same logical operator, the indication of precedence is optional.

Example:

Supported logical operators:

Invalid operators (e.g., [AN]) will be ignored in the parsing process.

Nested Statements

Selected components can be substituted by statements entirely. For example, the activation condition could consist of a statement on its own. The syntax is as follows:

componentSymbol{ componentSymbol(naturalLanguageText) ... }

As before, components are annotated using parentheses, and are now augmented with braces that delineate the nested statements.

Essentially, a fully annotated statement (e.g., A(), I(), Cex()) is framed by the statement annotation (e.g., Cac{ A(), I(), Cex() }).

Nesting can occur to arbitrary depth, i.e., a nested statement can contain another nested statement, e.g., Cac{ A(), I(), Cac{ A(), I(), Cac() } }, and can further include combinations as introduced in the following.

It is important to note that component-level nesting is limited to specific component types and properties.

Components for which nested statements are supported:

Nested Statement Combinations

Nested statements can, analogous to components, be combined to an arbitrary depth and using the same logical operators as for component combinations. Two constraints are to be considered:

The following example correctly reflects the combination of two nested AND-combined activation conditions:

Cac{ Cac{ A(), I(), Cex() } [AND] Cac{ A(), I(), Cex() } }

This example, in contrast, will fail (note the differing component types Cac and Cex):

Cac{ Cac{ A(), I(), Cex() } [AND] Cex{ A(), I(), Cex() } }

Another important aspect are the outer braces surrounding the nested statement combination, i.e., form a componentSymbol{ ... [AND] ... } pattern (where logical operators can vary, of course).

Unlike the first example, the following one will result in an error due to missing outer braces:

Cac{ A(), I(), Cex() } [AND] Cac{ A(), I(), Cex() }

Nesting can be of multi levels, i.e., similar to component combinations, braces can be used to signal precedence when linking multiple component-level nested statements.

Example: Cac{ Cac{ A(), I(), Cex() } [AND] { Cac{ A(), I(), Cex() } [XOR] Cac{ A(), I(), Cex() } } }

Note: the inner brace indicating precedence (the expression ... { Cac{ A(), I(), Cex() } [XOR] Cac{ A(), I(), Cex() } } in the previous example) does not require the leading component symbol.

Non-nested and nested components can be used in the same statement (e.g., ... Cac( text ), Cac{ A(text) I(text) Cac(text) } ... ). Those are implicitly AND-combined.

Component Pair Combinations

Where a range of components together form an alternative in a given statement (require linkage to another range of components by logical operators), IG Script supports the ability to indicate such so-called component pair combinations or component tuple combinations.

For instance the statement A(actor) D(must) {I(perform action) on Bdir(object1) Cex(in a particular way) [XOR] I(prevent action) on Bdir(object2) by Cex(some specific means)} draws on the same actor, but -- in contrast to combinations of nested statements -- we see combinations of pairs of different components in this statement, effective rendering those as two distinct statements. Braces (without indication of the component symbol as done for nested statement combinations) are used to indicate the scope of a given component pair (here I(perform action) on Bdir(object1) Cex(in a particular way) as first component pair, and I(prevent action) on Bdir(object2) by Cex(some specific means) as the second one, both of which are combined by [XOR]; but both statements have the same attribute actor and deontic must). Operationally, the parser expands this into two distinct (but logically linked) statements:

Note further that either pair or tuple can consist of an arbitrary number of components and can be imbalanced and use different components on either side. For example the statement A(actor) D(must) {I(perform action) [XOR] I(prevent action) on Bdir(object2) and affects Bind(object3) by Cex(some specific means)}, with a single component on the left side (I(perform action)) and multiple on the right side (I(prevent action) on Bdir(object2) and affects Bind(object3) by Cex(some specific means)) is equally valid input and decomposes into the statements

The use of component pairs can occur on any level of nesting, including top-level statements (as shown in the first example), within nested statements, statement combinations, and embed basic component combinations (e.g., A(actor) D(must) {I(perform action) on Bdir((objectA [AND] objectB)) Cex(in a particular way) [XOR] I(prevent action) on Bdir(objectC) by Cex(some specific means)}).

Furthermore, an arbitrary number of component pairs/tuples can be combined by using the syntax (similar to nested statement combinations) to indicate precedence amongst multiple component pairs with varying logical operators.

Example: A(actor1) {I(action1) Bdir(directobject1) [XOR] {I(action2) Bdir(directobject2) Bind(indirectobject2) [AND] I(action3) Bdir(directobject3) Cex(constraint3)}} Cac(condition1)

In this example the center part reflects the combination of different component pairs using the following logical pattern { ... [XOR] { ... [AND] ... }}, all of which share the same attribute (actor1) and activation condition (condition1).

Object-Property Relationships

Entities such as Attributes, Direct Object, Indirect Object, Constituted Entity and Constitutive Properties often carry private properties specific to a particular instance of that component (e.g., where multiple components of the same type exist).

An example is Bdir,p(shared) Bdir1,p(private) Bdir1(object1) Bdir(object2), where both Direct Objects (Bdir) have a shared property (Bdir,p(shared)), but only one has an additional private property (Bdir1,p(private)) that is exclusively linked to object1 (Bdir1(object1)).

In IG Script this is reflected based on suffices associated with the privately related components, where both need to carry the same suffix (i.e., 1 to signal direct linkage between Bdir1,p and Bdir1 in the above example).

The basic syntax (without annotations -- see below) is componentSymbolSuffix(component content), where the component symbol (componentSymbol) reflects the entity or property of concern, and the suffix (Suffix) is the identifier of the private linkage between particular instances of the related components (i.e, the suffix 1 identifies the relationship between Bdir1,p and Bdir1). The syntax further supports suffix information on properties (e.g., Bdir1,p1(content), Bdir1,p2(content2)) to reflect dependency structures embedded within given components or their properties (here: content as the first property, and content2 as the second property of Bdir1 -- where of analytical relevance).

The coding of component-property relationships ensures that the specific intra-statement relationships are correctly captured and accessible to downstream analysis.

Suffixes can be attached to any component type, but private property linkages (i.e., linkages between particular types of components/properties) are currently supported for the following component-property pairs:

Note that the extended syntax that supports Object-Property Relationships is further augmented with the ability to capture IG Logico's Semantic Annotations as discussed in the following.

Semantic Annotations

In addition to the parsing of component annotations and combinations of various kinds, the parser further supports semantic annotations of components according to the taxonomies outlined in the Institutional Grammar 2.0 Codebook.

The syntax (including support for suffices introduced above) is componentSymbolSuffix[semanticAnnotation](component content), i.e., any component can be augmented with [semantic annotation content], e.g., Cac[context=state](Upon certification).

This also applies to nested components, e.g., Cac[condition=violation]{A[entity=actor,animate](actor) I[act=violate](violates) Bdir[entity=target,inanimate](something)}, as well as for compound components, e.g., Bdir[type=target](leftObject [XOR] rightObject), in which case the annotation type=target is attached to both leftObject and rightObject in the generated output.

Examples

In the following, you will find selected examples that highlight the practical use of the features introduced above. These can be tested and explored using the parser.

Deployment

This section is particularly focused on the setup of IG Parser, not the practical use discussed above. IG Parser can both be run on a local machine, or deployed on a server. The corresponding instructions are provided in the following, alongside links to the prerequisites.

Note that the server-deployed version is more reliable when considering production use.

Local deployment

The purpose of building a local executable is to run IG Parser on a local machine (primarily for personal use on your own machine).

Server deployment

The purpose of deploying IG Parser on a server is to provide a deployment that allows remote use on the local network or the internet, as well as for production-level deployment (see comments at the bottom).