Closed jindong-zhannng closed 1 month ago
Hey @jindong-zhannng, thank you for the input.
While I would welcome an optional sync document builder model, I'm not sure it's technically feasible. There are quite a few use cases which actually require real async behavior. I see two main points that we have seen in adopter projects or here in the forums:
I might be missing something here, but AFAIK a sync generator based approach wouldn't be able to handle those use cases. What do you think about this?
Ok I got it. Maybe I can think of the entire "documents building" process as a pipeline with some tasks.
As the comments: https://github.com/eclipse-langium/langium/blob/6971c83afb924336d70b3e3092440478c73ac92e/packages/langium/src/workspace/document-builder.ts#L256-L282
It can be summarized as pseudocode like this:
pipe(
parse,
index,
computeScope,
link,
indexReferences,
validate,
)
AFAIK most of these tasks are in-memory operation and therefore are synchronous by nature.
About 2 cases you mentioned above, both of them are caused by trying to introduce I/O operation in few tasks:
resolveRemoteResources
after parsing step IMO.Both of these asynchronization can be optional and configurable. If the whole architecture could be more modular, the outer interface can be async only when there are async tasks, and be non-async if there isn't.
Seems generator is not the best option for this scenario lol.
Both of these asynchronization can be optional and configurable. If the whole architecture could be more modular, the outer interface can be async only when there are async tasks, and be non-async if there isn't.
Right, I'm not against refactoring some of the document builder API - but I don't want to make it too complicated either. I'm fairly happy with the current state of most APIs in Langium (except for the completion and formatting APIs, but that's for different reasons).
Note that Langium actually used to have a sync document builder, back when we were still at version 0.1, see https://github.com/eclipse-langium/langium/pull/244. While the initial PR only introduced asynchronous document building to interrupt purposes, there are now more use cases for async behavior, as outlined above.
Sure, I understand the difficulty it has and agree with too disruptive changes are not worth it at current stage. Thanks for your inputs.
The problem
I wanna parse inputs in a class constructor but currently the
parserHelper
and underlyingDocumentBuilder.build
are asynchronous (as discussion).The cause
The reason why
DocumentBuilder.build
is asynchronous is by these 2 lines:https://github.com/eclipse-langium/langium/blob/6971c83afb924336d70b3e3092440478c73ac92e/packages/langium/src/workspace/document-builder.ts#L172-L173
The reason of first line is there could be asynchronous event listeners registered from outside.
And the purpose of second line is for interruption and throttling during executing tasks. It will interrupt current task and give the control back to event loop if takes too long time, so that other pending tasks take priority.
The idea
For line 1. I'm curious does event listeners' result really matter here. Generally speaking, listeners should not affect main workflow IMO.
For line 2. It is pretty smart to use native event loop to implement interruptions. And I can imagine how efficient and easy it is in a highly asynchronous environment, for example, a language server.
But I have to say it's a little tricky and implicit, because the control of tasks scheduling belongs to system instead of us.
And besides, the nature of building documents should be synchronous IMO (I didn't find IO behaviors there, pls correct me if I'm wrong), therefore the appearance of async is confusing.
If I were the designer, I would prefer to use *[generator functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function)* (aka `function`) to implement interruptions. It is very suitable for expressing interruptible tasks naturally:
It gives the control of scheduling subtasks to outer. So that the caller can decide how to schedule those tasks, either synchronously or asynchronously. It will be very easy to configure behaviors in the outermost layer.
An example async scheduler:
Generators can also be easily nested as this example.
I know it's hard or even impossible to refactor the whole project to apply new pattern, and I'm also not sure is it possible to do a partial renovation. I just share my thoughts here for reference, and any comments are welcome.