comunica / sparqlee

⚙️ SPARQL expression evaluator library - Moved to @comunica/expression-evaluator
https://www.npmjs.com/package/sparqlee
14 stars 7 forks source link

Include Sparqlee into the main Comunica monorepo #131

Closed rubensworks closed 1 year ago

rubensworks commented 2 years ago

We may want to consider making this project part of the main Comunica monorepo because

Before we would do this move, we need to make sure that this project follows all of Comunica's coding conventions, and that we have 100% testing coverage, so this depends on #38.


Bounty

A bounty has been placed on this issue by:

Comunica Association
Variable budget

Click here to learn more if you're interested in claiming this bounty by resolving this issue.

wschella commented 2 years ago

Do changes in Comunica commonly require linking a dev version of sparqlee, or is it mostly when sparqlee changes? I assume it's the latter, but with sparqlee and comunica often changing in tandem with respect to the config possibilities in sparqlee which are driven by a need from comunica (so the update gets applied immediately).

Some observations:

But overall, from a developing comunica perspective I think it makes sense.(with it's many other packages)

From the perspective of developing sparqlee, with comunica it's only consumer, the current setup also makes sense. For my workflow, I just had a comunica clone with sparqlee always as a locally linked dev dependency, and when I was unsure about API changes breaking things in comunica (which was quite infrequent), I ran the tests manually, and in any cases errors would be caught when the comunica bumped the sparqlee version.

jitsedesmet commented 2 years ago

I understand both of these ideas. I also wanted to point out that if you'd like to put future student workers on sparqlee (like you did with me). Than it might be interesting to keep them separate. Comunica can be quite overwhelming. Working on sparqlee might me a nice way to pull students in. (At least that's how I feel) I want to emphasize that this is more of an emotional feeling.

Technically I don't know what the advantage would be to to merge sparqlee into Comunica (would it go faster?)

I think having sparqlee sparqlee linked into a local comunica isn't that big of a deal? (At least when you finally figure out how yarn link and yarn install work with respect to one another.)

Like @wschella said, it might be good to have sparqlee be a seperate package to prevent 'dirty' code.

Again, I don't really understand why you'd want to merge them, or what you mean with:

there are quite a bit of interdependencies, and changes in either project require annoying interlinking,

So you might need to explain that some more to me for me to see the bigger picture.

rubensworks commented 2 years ago

Again, I don't really understand why you'd want to merge them, or what you mean with:

there are quite a bit of interdependencies, and changes in either project require annoying interlinking,

This is the yarn link stuff.

The problem is that this is very error-prone. This way of working requires developers to manually do checks such as running integration tests etc, while they (also) should be done as part of the (Comunica) CI, which isn't being done atm. (remember my recurring question upon every PR: "have you tested this change in Comunica" -> this shouldn't be needed)

wschella commented 2 years ago

There are two distinct cases:

Even with seperate repos, failing tests will be caught in both cases, but only when comunica bumps the sparqlee version. No broken comunica versions can be pushed, only broken or inadequate sparqlee versions.

If a change in sparqlee causes an error in the comunica test suite, should that not have been a test in sparqlee as well? Do we want the first to require two PR's? One for the lib API update, and one for the consumer API usage update.

Another comment:

Ultimately, I think the pragmatics will come down to:

  1. Are you mostly work on comunica and all it's packages, of which sparqlee is a part (and a part of your API!), then you want them in a monorepo.
  2. If you mostly work on sparqlee you want them split. Even if comunica is your only consumer.

I have no opinion which one would be optimal of the comunica as a long term project.

rubensworks commented 2 years ago

If a change in sparqlee causes an error in the comunica test suite, should that not have been a test in sparqlee as well?

Yes, there should be tests in both places. It's mainly the integration (and spec) tests of Comunica I'm concerned about. While all (unit) tests may succeed in Sparqlee, this could still lead to integration test failures after wiring everything together in Comunica. This is because Sparqlee can't execute the SPARQL 1.1 test suite on its own.

wschella commented 2 years ago

This is because Sparqlee can't execute the SPARQL 1.1 test suite on its own It could, if we run the comunica test suite in sparqlee CI, but that of course only works if there are no API changes.

But sparqlee should have all the tests in the SPARQL 1.1 test suite that relate to expressions by itself (currently tests/spec, likely a bit incomplete). If there are failures in the integration or spec tests of Comunica, either (i) the logic in Comunica code calling sparqlee should be updated for new APIs, or fixed, or (ii) a corresponding test should be added to sparqlee.

Wether this is practical is something different, so I can definitely see the advantage of merging.

rubensworks commented 2 years ago

But sparqlee should have all the tests in the SPARQL 1.1 test suite that relate to expressions by itself (currently tests/spec, likely a bit incomplete).

Yes, but those are unit tests, not integration tests.

Even if all unit tests (covering the whole SPARQL spec) pass, some integration tests could still fail.

wschella commented 2 years ago

But in this case, there is nothing wrong with sparqlee so it doesn't make sense to hold a sparqlee PR back for it.

rubensworks commented 2 years ago

But in this case, there is nothing wrong with sparqlee so it doesn't make sense to hold a sparqlee PR back for it.

It could be a problem in sparqlee though. E.g. in the case where an integration test handles a case that is not tested in Sparqlee, even though the functionality is tested in isolation, such as certain complex aggregations over bindings streams.

We can be never be certain that we have enough tests, so having different levels of them is quite important.

jitsedesmet commented 2 years ago

The problem is that this is very error-prone.

True.

I start to understand why you'd want to merge.

Some more questions I have:

rubensworks commented 2 years ago

Are you not worried of code becoming more 'ugly' after merging? (We lose abstraction?)

This would just become another package within the packages/ directory, so the code would remain exactly the same.

Could we not just extend the sparqlee CI to test sparqlee in Comunica? Looking at Comunica as some open source sparqlee consumer?

Yes, but the duplication of the CI setup would be quite complex, and easy to get out of sync. Effort required for merging would probably be lower.

Do you fear the loss of sparqlee specific test code? -> With this I mean that sparqlee might have 100% test coverage due to the integration tests of Comunica but not have 100% coverage when not testing the Comunica suite (so, the sparqlee test suite as is). Wouldn't this be bad? Can we make sure we always assert 100% test coverage from the sparqlee test coverage itself? It's possible that all this is a non-issue?

Jest should have some tricks for this.

joachimvh commented 2 years ago

Do you fear the loss of sparqlee specific test code? -> With this I mean that sparqlee might have 100% test coverage due to the integration tests of Comunica but not have 100% coverage when not testing the Comunica suite (so, the sparqlee test suite as is). Wouldn't this be bad? Can we make sure we always assert 100% test coverage from the sparqlee test coverage itself? It's possible that all this is a non-issue?

Jest should have some tricks for this.

In the Community Solid Server we do this by running unit tests and integration tests separately and only requiring 100% test coverage on the unit test run for example.

rubensworks commented 2 years ago

Another reason for including this in the monorepo is the recent new dependency on @comunica/bindings-factory, which complicates the release process quite a bit.

So we should really make work of this issue (and #38) soon.

rubensworks commented 2 years ago

Now that #38 is done, we don't have any blockers for this anymore.

wschella commented 1 year ago

:clap: