CultureFoundryCA / fst-runtime

This project represents a Python package to query finite-state transducers that have been compiled to the AT&T format.
MIT License
0 stars 0 forks source link

Refactor graph data structure #4

Closed scott-parkhill closed 3 weeks ago

scott-parkhill commented 3 weeks ago

We're going to need to refactor the graph data structure into something more strongly typed. We need something that is easily workable in two directions: from starting state to accepting state(s), and from an accepting state to the starting state. This is so we can input indices and retrieve index+PLURAL, as well as input index+PLURAL and retrieve indexes and indices. Note that multiple outputs can be achieved in both cases, up or down the graph using the foma terminology.

With the current data structure, the reverse operation on the graph is very expensive. If we change our data structure though, we won't have to reverse anything; all we'll have to do is change if we start at our start state or accepting state, and whether we're following the graph normally or in reverse.

This'll look something like this (I'm unsure about if this visitor stuff is needed but I put it in because I think it might be):

class DirectedEdge:
    def __init__(self):
        self.id: int
        self.source_node: Node
        self.target_node: Node
        self.input_symbol: str
        self.output_symbol: str
        self._last_visited_by: int
        self._times_visited: int

    def visit(self, visitor: DirectedNode):
        if visitor.id == self._last_visited_by:
            self._times_visited += 1
            return

        self._last_visited_by = visitor.id
        self._times_visited = 1

    def visits_by(self, visitor: DirectedNode):
        return self._times_visited if visitor.id == self._last_visited_by else 0

class DirectedNode:
    def __init__(self):
        self.id: int
        self.is_accepting_state: bool
        self.transitions_in: list[DirectedEdge]
        self.transitions_out: list[DirectedEdge]

class DirectedGraph:
    def __init__(self):
        self.start_state: DirectedNode
        self.accepting_states: list[DirectedNode]
        self.multichar_symbols: set[str]

Something like this I think. The instance variables _last_visited_by will be a unique ID for that current walk of the traversal algorithm, and for each time that walk walks that edge, you add to the times visited. We'll then cap times visited. This can then be used to stop infinite looping. We'll then read the .att file into this structure, and change our algorithm to walk this graph.

scott-parkhill commented 3 weeks ago

Closed by 0e0c3acca341f77b569ad682b60cf63ba0cd8381