egraphs-good / egglog

egraphs + datalog!
https://egraphs-good.github.io/egglog/
MIT License
400 stars 45 forks source link

Remove orphan nodes from serialized egraphs #368

Closed oflatt closed 4 months ago

oflatt commented 4 months ago

When a maxmim number of serialized enodes per function is set, or when users use delete, the current serialization code leaves off children. This results in confusing visualizations and ill-typed terms. We ran into this in eggcc.

Here I have an ugly solution that removes nodes until a fixed point, similar to extraction. It ensures all the nodes in the serialized egraph have children.

oflatt commented 4 months ago

This still has problems- rewriting now

oflatt commented 4 months ago

I'm quite unhappy with this PR, but I don't see another way to deal with the problems with orphaned nodes. Perhaps @saulshanabrook has a suggestion for how to clean it up?

yihozhang commented 4 months ago

Do you have an example where things can go off?

yihozhang commented 4 months ago

oh nvm, I understand the problem now with the comments in the PR. I think it is on the users to not use the delete action that could lead to e-classes being ungrounded? Do we create ungrounded e-classes with delete in eggcc?

I don't have strong opinions about what we should do with viz under size limits.

saulshanabrook commented 4 months ago

When a maxmim number of serialized enodes per function is set, or when users use delete, the current serialization code leaves off children. This results in confusing visualizations and ill-typed terms. We ran into this in eggcc.

I hadn't thought about the delete case! It seems a bit pathological... Do you find it coming up somewhere? I wonder if should even allow this behavior in the first place...

In terms of the max number of nodes, I agree it produces visualizations that have the wrong number of arguments for functions, if they were omitted they will just not be shown.

Again for this, if you have a particular example that would be helpful to look at. The trimming is meant as some sort of debugging tool... to still give you a sense of the graph without having to omit all of it. Maybe based on the example you are looking at we could come up with a reasonable way to make it more helpful?

EDIT: One idea might if a node points to an empty e-class that was removed bc of the max # of nodes, we could add a dummy node like "" to fill it, so the edge can still exist.

oflatt commented 4 months ago

We are running into this case with eggcc, since we are abusing delete recently. It's easy to write an example, just delete a leaf node: https://egraphs-good.github.io/egglog/?program=XQAAgACTAAAAAAAAAAAUGQgnfMUD9dO10C6gYDgGpv3OlpxDkJUt6jGNQtRZOLSCLUYu5JKbot9I5KG4je3lTY-sI3WWWcM4zLmo_r3QQzlEdQTC1KY2y57eIrs_X_9nHgAA

I think for visualization we should at least mark that we left off some nodes due to size constraints, and have a big warning.

For non-viz use cases, I'm not sure what to do. Give an error? A dummy eclass seems like a bad idea because it's not something that is actually there

oflatt commented 4 months ago

Current hack for eggcc is to use subsumption instead, and because serialization ignores subsumption right now we are ok

saulshanabrook commented 4 months ago

Yeah I think we want to do something on the delete case at least...

If we changed the serialization format to point to e-classes instead of e-nodes then we could represent this OK.

saulshanabrook commented 4 months ago

After we discussed this yesterday, we thought it made the most sense to support this at the serialization level by adding the ability to represent empty e-classes in the serialized format. I opened https://github.com/egraphs-good/egraph-serialize/issues/11 to track that and https://github.com/egraphs-good/egraph-serialize/issues/12 to track adding this algorithm present here to the underlying serialized format.

In tandem with https://github.com/egraphs-good/egraph-serialize/issues/11, we can open a new PR here that fixes how we omit missing nodes by using the new support instead of just omitting them.