Open Qkessler opened 11 months ago
Thanks for looking at this. There are a few issues I see with splitting the symbols:
First, since we are overlapping symbol values they are no longer unique. Both floor
and nil
are initialized with Symbol::new_builtin(0);
meaning that symbols are not longer unique identifiers. If you get the value 0, you can't distinguish which symbol it was. If I do cargo run --release
the program fails with a bus error, meaning that we triggered UB somewhere.
Second is the issue you saw with pointer provenance in miri. This is related to symbols no longer being unique. Since symbols are just integers, we need to give them provenance before they can be used. But now we don't know which provenance to restore them to, since we have two unique ranges (BUILTIN_SYMBOLS
and DEFUN_SYMBOLS
).
Last is that I think this is a less than ideal mental model for symbols. Fundamentally there is no difference between "defun" symbols and "variable" symbols. All elisp symbols can hold both a function and value. The only difference in this case is their initial values. it feels weird to have this distinction.
As far as ergonomics go, I see a few issues:
BUILTIN_SYMBOLS
and the constant declarations. If someone puts one in the wrong order that will create some very tricky bugs.defvar!
or defsym!
near the defun to declare them. But with the new scheme I would instead have to go an edit a different crate every time. It seems like declarations that work together should live together.I think the best way to solve these issues is to remove declarations from the core module. They are not needed there and are associated with higher level operations (like the interpreter and auxiliary functions). That will also let us avoid the "two different types of symbols" problem. I am not necessarily opposed to getting rid of defvar!
and defsym!
and just putting them in a giant manual list, but we should only need to specify a symbol once, not three times. This will probably require macros.
Thanks for the feedback, I agree with your points. I think the suggestion for splitting them comes from the symbols being consumed on the rune-core
crate, even though probably that's something that can change and we could potentially move that code to the main package (haven't seen how, will investigate that next).
I think the best way to solve these issues is to remove declarations from the core module. They are not needed there and are associated with higher level operations (like the interpreter and auxiliary functions).
Right, that makes sense, looking into that next.
I'm good with keeping the symbols together, it makes sense. What I'm thinking is that if we were to move all the symbols back to the main crate (rune) we would tie all the files that use them to the main crate (if we don't refactor again). Specially thinking about interpreter
and reader
. They need the symbols and it's tied to their responsability list. Nevertheless, by moving the symbols to rune
, we can't really move interpreter and reader to a hypothetical rune-eval
as we were discussing by email, as rune-eval
can't depend on the main crate.
I'm thinking on different solutions.
Let's remove the separation between the symbols, so we would have everything on the build.rs. Let's think about the references of that build script on the rune-core, and maybe we remove those refs to interpreter or others (probably those will move to rune-eval in the future).
Searching for the references on upstream/master
, we see there are three modules where the symbols are used in core (and probably shouldn't be):
tagged
: implements IntoObject for NIL and TRUE: we can probably move that, no need to implement there (probably). The cons is that we would be moving something just because we don't have the symbols, although we could have an enum for NIL and TRUE and then convert the actual symbol when we need it (newtype pattern). REVIEW: The issue with this is that we won't have the pointers for the NIL and TRUE symbols, as they are defined in the main crate, and not core, which will lead us to probably move IntoObject
implementations around.env
: We have a test that is using sym::init_symbols
, but definitely something we can move + using sym::NIL
.symbol
: BUILTIN_SYMBOLS directly, here.We can't really move the symbols to rune-core, since we don't have the definition for all the functions that are generated with #[defun]. That said, we need to move stuff out of the core.
For the tagged
issue. If we were to move the IntoObject
outside the rune-core crate, say with something like this to avoid the orphan rule:
struct Bool(bool);
impl IntoObject for Bool {
type Out<'a> = Symbol<'a>;
fn into_obj<const C: bool>(self, _: &Block<C>) -> Gc<Self::Out<'_>> {
let sym = match self.0 {
true => sym::TRUE.into(),
false => sym::NIL.into(),
};
unsafe { Self::Out::tag_ptr(sym.get_ptr()) }
}
}
We would need to import the TaggedPtr
trait, which is also not ideal. Looking into alternatives.
Also wondering whether special symbols, like true
or nil
(specially nil
, since we use it everywhere) should be treated differently. I'm thinking whether Symbol
should be placed on the main crate (for now, we can think about moving them around to another crate in the future).
Kinda ties to thinking whether Env
and its corresponding symbol.rs and env.rs should be moved to the main crate (for now, again). I'll find references:
BuiltInFn
, which is the subr
of SubrFn
, so it could be hard to move around.Overall it kinda seems like the core was architected at the beginning without the intention of moving it. I believe the symbols are hard to place in the different crates. We should be able to do it, but it's a hard effort to reason about.
If we look at these issues one at a time:
nil
and true
in tagged.rs
: all symbols are represented as integers and these two are 0 and 1 respectively. We should be able to define them in both crates (in core for tagged and in the other crate where symbols are defined for BUILTIN_SYMBOLS
).Env
: this can be moved out of core. EvalError
: this is only used in the interpreter, so we can move it out. TypeError
can stay though. BuiltInFn
: this one might be a little more tricky, but we probably make that type take a generic type for env
if we decide to move env out of core. Thanks for that, really helpful. We still have the issue of the Symbol
get method, using the BUILTIN_SYMBOLS!
Hmm. That one is harder. I can’t think of a way to solve that cleanly. Will have to think about it more.
Hi there @CeleritasCelery. I'm thinking about this. I should be able to squeeze some time into it this weekend.
I see that you have included the stack in env as part of one of the latest commits. I'll merge and continue with my refactor.
Will schedule some time this week for this. I plan to continue with the core separation, although the path is harder now, since we have most of the remaining files on the core
directory depending on "each other" (not really, but there are some core modules that are the most depended on, in conjunction).
Asking for some guidance regarding the remaining open questions:
I'll edit this when others come to mind.
I think we can move env.rs out of core, but we can do that later. If it is easier to leave it in for now, than leave it in. The only things that depend on it are EvalError
(which can move out of core) and BuiltInFn
(which we could make generic over the the env
argument).
The one that I still can't seem to figure out is Symbol
. The Deref
implementation for it relies on BUILTIN_SYMBOLS
, which will be in the main crate. I can't seem to think of a way to solve it. That is the biggest blocker I can see.
Looks like we might be able to use extern statics to get around that issue.
I think we also have the opportunity to think on how we want the core to look like. If this Move the core
out is taking as long as its taking, I would recommend we think on the structure we want to have.
Being concrete, it seems like we have dependencies shared across the different modules in core, even crossing. Maybe I'm reading reading something wrong, but I see that lots of the core modules (env, gc) depend on object
(and its submodules), but Object
is created in tagged.rs
, which also depends on env, gc.
I'm missing the big picture, but probably I'll try to find some time to draw something that may be helpful here, with the current dependency graph and the modules we have.
We can use cargo-modules to visualize the dependencies. Different modules in the core can have crossing dependencies, but what we want to avoid is having core depend on things outside of core.
Agree! Nevertheless, it could show that we could improve the structure, to be one-directional, which would potentially force us to declare individual modules, easier to understand and maintain!
Enrique Kessler Martínez. c/ Jiménez de la espada, Cartagena, España.
On Thu, Jan 4, 2024 at 20:23 Troy Hinckley @.***> wrote:
We can use cargo-modules https://github.com/regexident/cargo-modules to visualize the dependencies. Different modules in the core can have crossing dependencies, but what we want to avoid is having core depend on things outside of core.
— Reply to this email directly, view it on GitHub https://github.com/CeleritasCelery/rune/issues/47#issuecomment-1877640506, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALN7QXF6LIN7ZMKEGKHV4QDYM36TVAVCNFSM6AAAAABANWEZUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGY2DANJQGY . You are receiving this because you authored the thread.Message ID: @.***>
This is the issue tracking my work to move the core out to the workspace crate
rune-core
. I have the code in my fork, almost ready to PR, though I had some issues with Miri. I wanted to create the issue to get some ideas over ways to resolve the problems, and introduce @CeleritasCelery to the different changes on my PR first to get some early feedback. Here's the commit.It's key to note that in the process, we include different changes that guide the build process of the crates, in order to arrive at a place that is both ergonomic and easy to maintain:
We removed the
defvar
,defvar_bool
anddefsym
macros. They were not adding that much regarding work saved (just create a constant and initialize it) and they were convoluting the dependency graph between the crates. It's key that the core crate can use the symbols that were previously generated on the build.rs file, but it does not need to know the defuns.For that, we moved the variables and symbols (not the defun symbols) to a static file
env/sym.rs
. That file contains all the required constants for the variables and symbols.On the other hand, the defun symbols and constants are still generated with the build script, to avoid loosing the convenience of defining the functions with the
defun
proc macro.The issue is that
Symbol::get
makes a ptr dereference over the BUILTIN_SYMBOLS (which now don't include the defuns) so if we try toget
a defun symbol, we UB. Here's the test triggering it:There we check the bytecode for
defun::SYMBOL_NAME
, which is no longer part of BUILTIN_SYMBOLS. What do you think? What could potentially be a solution?