compiler-research / xeus-clang-repl

Apache License 2.0
34 stars 14 forks source link

Double execution does not work properly #72

Open p4vook opened 1 year ago

p4vook commented 1 year ago

Currently, if a cell with some definitions is executed twice, the second time will drop an error. I think, this behavior is undesired, because it doesn't allow modifying functions in cells.

Is there any way to overcome this?

vgvassilev commented 1 year ago

Currently, if a cell with some definitions is executed twice, the second time will drop an error. I think, this behavior is undesired, because it doesn't allow modifying functions in cells.

Is there any way to overcome this?

Thanks for opening this issue.

Currently the only way to avoid is by restarting the kernel. Clang-Repl supports unrolling to the previous state and we "just" need to implement it. The reliability of that feature is not the best in clang-repl but it will be a good start.

The idea is that we need to count the number of cells, M, from the last cell, N to the one being requested to re-execute. Then we need to make the equivalent of .undo N - M + 1. Should not be hard to implement but we need to add it as an interface to https://github.com/compiler-research/CppInterOp and then use it here.

p4vook commented 1 year ago

Apparently the problem was fixed in Cling 0.7.0 by implementing a function allowRedefinition(). There was a similar issue in xeus-cling (https://github.com/jupyter-xeus/xeus-cling/issues/91).

I can't seem to find similar function in clang-repl. Maybe this could be suggested to upstream?

UPD: related links https://root.cern/blog/cling-declshadow/ https://github.com/root-project/cling/issues/259 https://dl.acm.org/doi/abs/10.1145/3377555.3377901

vgvassilev commented 1 year ago

Redefinition would mean that

double d = 12.;

and

std::string d = "12";

would work across cells. That does not necessarily mean that we can re-execute cells. One of the most controversial feature of the implementation in cling is that we mess up with the lookup tables in the compiler which is perhaps not the best idea.

PS: This implementation does not have a lot of chances to go upstream in clang. We should probably figure out how to make it less intrusive.

p4vook commented 1 year ago

As far as I understand, something involving nested inline namespaces was used to implement this in cling. I looked at the original pull request (https://github.com/root-project/root/pull/4214/) and I didn't find anything too messy there. But I'm not really sure as the code is a bit complicated for me.

p4vook commented 1 year ago

I think the approach with .undo-ing stuff should be thought through a little bit more (or at least I should think a bit more about this) because it loses all the data that was computed in all the cells lower than the reexecuted one, even if they were completely unrelated. This doesn't match the experience with other Jupyter notebooks, where the data stays where you put it until you modify the cell or specifically overwrite it.

vgvassilev commented 1 year ago

I think the approach with .undo-ing stuff should be thought through a little bit more (or at least I should think a bit more about this) because it loses all the data that was computed in all the cells lower than the reexecuted one, even if they were completely unrelated. This doesn't match the experience with other Jupyter notebooks.

You will not going to lose it as it will still be on the notebook but the internal state of the infrastructure should be as if it was rewinded to that place. I am not sure I see how else that could be done even on the python end.

We cannot really reason about dependency graphs and connectivity as generally subsequent computations depend on previous ones. That is, generally there might have been some disconnected graph that does not affect what you want to undo, however, in practice that seems to be hardly the case.

PS: Consider:

cell 1: int i = 12; cell 2: i++; cell 3: print(...,i); cell 4: i = 0;

If you want to change/re-execute cell 2, the data in cell 3 is invalid anyway. If we consider cell 4, do you expect the result in cell 3 to be 1 instead of 13?

p4vook commented 1 year ago

Not really, I meant the following

> [1] int n = 5;
> [2] int f(int x) { return n * x; }
> [3] std::vector<int> v(f(n), n);

Here I expect to be able to access v = {5, 5, 5, 5, ...} and f after I change the cell 1 to int n = 10;.

UPD: perhaps this should be limited to only objects, and not functions.

vgvassilev commented 1 year ago

Not really, I meant the following

> [1] int n = 5;
> [2] int f(int x) { return n * x; }
> [3] std::vector<int> v(f(n), n);

Here I expect to be able to access v = {5, 5, 5, 5, ...} and f after I change the cell 1 to int n = 10;.

Ok, I see but I fail to see the use-case. If v depends on n and you intend to change n then that program is essentially invalid. What would happen if I pass n by address? Should it still be a vector of 5s or a vector of 10s?

p4vook commented 1 year ago

Yes, the program is essentially invalid in its current state, but the invalid piece of code wasn't executed, so it should be possible to access the results of the code, whatever they were (even if that's generally UB, as the underlying internals might change). So, it should still be a vector of 5s, as we didn't execute the code that changes it. This behaviour is consistent with Jupyter Python kernel, and I think it should be an option. It allows accessing results of long computations even if some fixes to initial environment were made.

P.S. Is it possible to maybe implement nested inline namespaces without messing with the compiler internals? I think this would produce the desired behaviour.

vgvassilev commented 1 year ago

P.S. Is it possible to maybe implement nested inline namespaces without messing with the compiler internals? I think this would produce the desired behaviour.

Probably, can you make a prototype in godbolt to see how that'd work?

p4vook commented 1 year ago

Something like this, maybe (I would rather have named namespaces though). https://godbolt.org/z/3neGh6MKK Note: it crashes clang-repl 17.0.1 with SIGSEGV :)

UPD: a bit more sophisticated example: https://godbolt.org/z/6c6PMrzPK

p4vook commented 1 year ago

Did I understand you correctly about the prototype or did you mean something else?

p4vook commented 11 months ago

I think I have a more desirable variant using C++17 syntax that does not require full recompiling to add new block. https://godbolt.org/z/hrd1TdG3K What do you think?

vgvassilev commented 10 months ago

Thanks for working on this.

I believe there are two aspects that are somewhat relevant here. Redefinition of entities which allows us to use the same name with different meaning, and re-running cells. In this paper we discuss how variable redefinition is done in cling: https://dl.acm.org/doi/abs/10.1145/3377555.3377901

If you want to rerun cells I believe we really need to use the watermarking approach described earlier here

p4vook commented 10 months ago

I've implemented my approach using C++17 nested namespaces syntax in the PR #80 (it is very similar to the approach in the article, at least according to the abstract, but it doesn't require any messing with lookup tables). It seems to work almost perfectly (except for the bug in undo that showed up in PR https://github.com/llvm/llvm-project/pull/75547, see test failure). The bug is that we should clean up only those parts of the AST that have changed after the last declaration (otherwise the whole namespace hierarchy gets wiped out), which is tricky.

jalopezg-git commented 9 months ago

Hi @p4vook,

This is Javier, one of the authors of the implementation (and the related ACM CC paper) of entity redefinition in cling. Apologies, I was totally unaware of the existence of this issue. There we decided to use non-nested inline namespaces and some manual tweaking of the lookup table as described in the paper. The approach proved to work well, passing cling's and ROOT's test suite completely. As @vgvassilev said, even if a little intrusive (due to the patching of lookup tables), it prevented further patches to clang and, being an "AST transformer", can still be enabled / disabled on demand.

For clang-repl, however, we still need to see how we are going to efficiently address this. I may be dedicating a few days to this, so I'll come back to you in the next few weeks!

Other than that, I hope that you had a wonderful entry into 2024!

Cheers, Javier.

p4vook commented 9 months ago

Hi, @jalopezg-git!

I would be glad to tell you any details I learned about this. It's all pretty much described in the links below:

I'd love to join the discussion about any of these problems.

The entry into 2024 was, indeed, not too bad. Hope yours was wonderful.

jalopezg-git commented 4 months ago

Hi, @jalopezg-git!

I would be glad to tell you any details I learned about this. It's all pretty much described in the links below:

* Overall [idea](https://godbolt.org/z/hrd1TdG3K) of the implementation

* Current [implementation](https://github.com/compiler-research/xeus-clang-repl/pull/80) I've come up with (doesn't really work without some fixes in clang)

* Clang [bug](https://github.com/llvm/llvm-project/issues/73632), related to incorrect (in my opinion) handling of namespaces

* [Resurfacing](https://github.com/llvm/llvm-project/issues/72980) of the same bug, that produces definitely incorrect behavior

* Failed [attempt](https://github.com/llvm/llvm-project/pull/75547) to fix the bug (`for` stops working for some reason 🤷)

I'd love to join the discussion about any of these problems.

The entry into 2024 was, indeed, not too bad. Hope yours was wonderful.

Oh, god! I totally forgot replying here! It has been a few quite hectic months at work... :sweat_smile:

Is there any news on this? I might have some time now in Summer to do something on it, if needed!

vgvassilev commented 4 months ago

Hi @jalopezg-git, great to hear from you! We have moved to a new repository: compiler-research/xeus-cpp. If we do work it needs to be there.