Testing against GNU Emacs

CeleritasCelery commented 7 months ago

We are striving to be “bug compatible” with GNU Emacs, in so far as it makes sense to do so. We might break with some behavior if it is obscure enough or adds enough value to justify it. But the right answer to “what should this function do?” Is almost always “whatever GNU Emacs does”.

Given that, we would like to have a way to test against Emacs and compare behavior. This issue is to brainstorm the best way to do that.

Currently the plan is to create a new rust binary that can feed a test file into Rune and GNU Emacs and make sure they get the same output. If one throws an error, so does the other. And the result of each expression is the same.

This can be expanded to include fuzzing/property testing. There could be some code to parse each built in function in Rune and get the type signature. We could then test the arity, accepted types, and random values against GNU Emacs. This would help flush out edge cases and differences in behavior. It could also help us catching changes between major version upgrades of GNU Emacs.

We could also create dedicated fuzzers for specific functionality. For example we have some code to convert a lisp regex to a rust regex. We could send in random regex and ensure that if Emacs considers it valid, then it is also converted to valid rust regex. Another example is printing; ensuring that printed representation of everything is the same.

Qkessler commented 7 months ago

Currently the plan is to create a new rust binary that can feed a test file into Rune and GNU Emacs and make sure they get the same output. If one throws an error, so does the other. And the result of each expression is the same.

I'm thinking about this. I really like this Github Action to set up emacs at specific version. We can then prop test by generating some elisp that the emacs command on CI can run. There are different ways to do it, but probably the easiest is to generate an .el file and have emacs run it with -q -x. We can generate an .el file from all the defuns we generate. It could get complicated fast, say, defuns need to run in a specific order to work correctly. I would imagine GNU Emacs would fail as well, and the output would be consistent vs rune.

CeleritasCelery commented 3 weeks ago

@Ki11erRabbit As you have seen with string-version-lessp (#78) It can be hard to match the behavior of GNU Emacs when porting code, even when you have the source code in front of you. The idea in this issue is to create a separate utility to compare Rune and GNU Emacs with property testing to help flush out issues. I have started working on this with a little CLI tool here. All it does right now is compare to see if the same functions exist, but we could expand it to generate input to the functions and compare the outputs. I have started writing it in Rust with proptest, but we could in theory write it in any other language (like python with hypothesis).

If you are interested this might be something you could tackle, because you have already seen how hard it is to get the functionality right. I have hit many behavior mismatches before, but usually they are deep in some other code and it takes me hours to find the source. With this utility we could fuzz new functions as they are created and hopefully make the quality much higher and bugs easier to flush out. I am sure our current defun's are loaded with issues that we just haven't seen yet. Let me know what you think.

Ki11erRabbit commented 3 weeks ago

If you are interested this might be something you could tackle, because you have already seen how hard it is to get the functionality right. I have hit many behavior mismatches before, but usually they are deep in some other code and it takes me hours to find the source. With this utility we could fuzz new functions as they are created and hopefully make the quality much higher and bugs easier to flush out. I am sure our current defun's are loaded with issues that we just haven't seen yet. Let me know what you think.

I wouldn't mind helping, although I will be very busy the next 2 weeks or so. But after that I should be able to help with this.

Ki11erRabbit commented 1 week ago

I am now free to help. What would you like me to do @CeleritasCelery?

CeleritasCelery commented 1 week ago

It depends on what you want to do 😄 . If working on the this particular item interests you, than take a look here at what I have started with. If you wanted to start over, that is fine too. Essentially this code runs both Rune and GNU Emacs and compares the output. Right now it only sees if they both have the same functions defined, but we can expand it to do more. Some next steps:

We need to extract more information about the function definitions from the rust source. Probably using syn. This includes the number of arguments, number of optional arguments, return type, and argument types.
Make sure that function arity matches between the two implementation (using func-arity).
Use a prop testing library to generate random inputs and ensure the functions return the same outputs. If we know the types a function expects we can narrow down the types that we generate to test more interesting properties. For example we could test a bunch of random strings with string-version-lessp to help catch any corner cases that we missed.

Of course this all depends on if this is something you want to work on. I think it would be a good task because it is open and does not require a lot of context about the current system. But if there is something else you are interested in more, let me know.

Ki11erRabbit commented 1 week ago

Sure I get to work on something. It should keep me busy.

Ki11erRabbit commented 1 week ago

I have a few questions after thinking about the problem.

How aware do we want the tester to be aware of the types? Since if we just take an object, we don't actually know what the type is. It would be nice to have a list of possible types that a function could have even if the list is somewhat incomplete.

Should I make a type that represents a function and use that to generate arbitrary function calls that we could test against Emacs? If so it will rely on the above information to provide arbitrary input.

CeleritasCelery commented 1 week ago

How aware do we want the tester to be aware of the types? Since if we just take an object, we don't actually know what the type is. It would be nice to have a list of possible types that a function could have even if the list is somewhat incomplete.

Agreed. The more specific the types, the more useful the input we can generate. Many builtin functions use specific types like &str and usize that we can extract, but for ones that take Object we don't have that info. We need to think of some way to include it. Maybe through a comment, annotation, or attribute?

Should I make a type that represents a function and use that to generate arbitrary function calls that we could test against Emacs? If so it will rely on the above information to provide arbitrary input.

I think that is a good approach.

Ki11erRabbit commented 6 days ago

While working on the tester I thought of way to provide more type information.

I think that an attribute might be best if it can do these things:

State the positional arg (like an int). This way we can indicate multiple types for the same argument
Give the type name. This is for convience
State whether or not the argument is optional.

I think that this would make parsing with Syn much easier.

CeleritasCelery commented 5 days ago

Most of that information should already be there.

The Rust types should map fairly cleanly to the lisp types.

pub(crate) fn string_lessp<'ob>(
    string1: StringOrSymbol<'ob>,
    string2: StringOrSymbol<'ob>,
) -> Result<bool> {

This tells us that the argument type is a string or symbol (which means we can test a string against it)

pub(crate) fn less_than(number: Number, numbers: &[Number]) -> bool {

This tells us that the arguments are numbers (either int or float)

Optional arguments from lisp are Option in Rust.

pub(crate) fn require<'ob>(
    feature: &Rto<Gc<Symbol>>,
    filename: Option<&Rto<Gc<&LispString>>>,
    noerror: Option<()>,
    env: &mut Rt<Env>,
    cx: &'ob mut Context,
) -> Result<Symbol<'ob>> {

Here we know that feature is a symbol, filename is an optional string, noerror is just optional (nil or t). This is the reason we use Option<()> for optionals instead of bool. it let's us distinguish between required boolean flags and optional lisp values.

Let me know if I am not understanding your question.

Ki11erRabbit commented 4 days ago

I think you are on point. Could you maybe make a list of all of the types and their equivalents in elisp?

CeleritasCelery commented 4 days ago

sure thing.

Rust Type	Elisp Type
usize	integer
i64	integer
isize	integer
f64	float
Number	integer or float
&str	string
StringOrSymbol	string
bool	`t` or `nil`
List	`nil` or cons
Function	function
Option<()>	`nil` or non-nil
ByteString	unibyte-string
LispVector	vector
LispHashTable	hash-table
Symbol	symbol
Cons	cons
Record	record
ByteFn	byte-code-function
SubrFn	subr
Buffer	buffer

Some of these like string, integer, and float will be easiest to generate data for.

Ki11erRabbit commented 3 days ago

Thank you that has been very helpful. Although, could we make an alias for Option<()>? It creates a slightly weird edge case in my code. It would also make the type much clearer.

CeleritasCelery commented 3 days ago

I am fine with that. What should we call the type?

Ki11erRabbit commented 3 days ago

I am fine with that. What should we call the type?

I am thinking something like AnyOrNil or something along those lines.

CeleritasCelery commented 3 days ago

I added a type alias called OptionalFlag for that type.

Ki11erRabbit commented 2 days ago

I added a type alias called OptionalFlag for that type.

Thank you

Ki11erRabbit commented 2 days ago

I thought I would give an update. I have manged to get it to generated a very large test file that has random values. There are still some thinks to work out though.

I have one concern. I don't know how to handle randomly generated functions. Right now they have a random arity < 0 and return nil. I think that they should return something other than nil sometimes

CeleritasCelery commented 2 days ago

That’s great to hear! Feel free to open a PR.

As far as function go, I think we will need more info on what kind of function is needed. Otherwise you won’t be actually testing interesting properties of the defun. We could always just skip them for now. Maybe some attribute or comment that provides info on what kind of function to generate.

Ki11erRabbit commented 2 days ago

The only things left are to make it so that lists actually have elements in them, make a decent cmdline interface, and set up a test harness.

After I fix the list bug and give it a cmdline interface should I submit a PR?

CeleritasCelery commented 2 days ago

Yes please!

Ki11erRabbit commented 2 days ago

I also thought of a way to solve the function arity issue. We could just make a type alias to Function that specifies the arity.

CeleritasCelery / rune

Testing against GNU Emacs #43