diku-dk / alpacc

MIT License
5 stars 0 forks source link

Concrete syntax tree construction. #4

Closed WilliamDue closed 6 months ago

WilliamDue commented 1 year ago

Currently the parser only returns the pre-ordering of the productions used to derive the input.

An idea that could make it easier to use would be if a vector containing the parent indices are also constructed (like in pareas) and some datastructure which describes how many terminals are between the nonterminals in a production.

An example is if we have the grammar.

0) E → TE' 1) E' → +TE' 2) E' → ε 3) T → a 4) T → [E]

Then a left parse of "a+[a+a]" would be parse "a+[a+a]" = [0, 3, 1, 4, 0, 3, 1, 3, 2, 2]. The parent vector would then be [0, 0, 0, 2, 3, 4, 4, 6, 6, 2] where each value in the array corresponds to an index in the pre-ordering. The data structure i mentioned could could then be something like.

let arity_array =
  [[#just 0, #just 0, #just 0],
   [#just 1, #just 0, #just 0],
   [#just 0, #nothing, #nothing],
   [#just 1, #nothing, #nothing],
   [#just 1, #just 1, #nothing]]

This structure contains information about the arity of each production by the number of #just minus one. An example is at production 4 arity_array[4] = [#just 1, #just 1, #nothing]. Here arity_array[4][0] = #just 1 means that there is one terminal before the first nonterminal and arity_array[4][1] = #just 1 means there is one terminal after the first nonterminal. While arity_array[4][2] = #nothing means there are no more nonterminals.

I am not quite sure how to construct the parent vector in a parallel time complexity of O(log n). I do also not know if this is the best representation.

WilliamDue commented 6 months ago

alpacc now creates syntax trees.