Improve logic used for sorting top-level expressions

junedev commented 2 years ago

The current sorting is done by type of expression and then by size of the "construct". It seems this is done before the naming unification because it can decrease the quality of the representation. For bird watcher, the order of functions/tasks is different from representation to representation although it is the same in the solution code. That reduces the likelihood that two solutions have the same representations.

Commenting out the sorting let to somewhat better results (no massive improvement though). In general, the sorting is a good thing though but it seems we should fine tune the logic so that it keeps the order of the tasks. That also makes it easier to reconcile the representation with the student code when looking at it.

iHiD commented 2 years ago

That also makes it easier to reconcile the representation with the student code when looking at it.

On this note, we're not planning on showing the representation to the mentor. Instead we'll show a sample student's solution, and the 2 others to sanity-check against.

(However, I understand you are still viewing the representation when actually dev'ing this repo, which you might be referring to)

junedev commented 2 years ago

@iHiD Yes, that was what I meant. Angelika told us in Elixir they started with "non-readable" representations and it didn't work out well (e.g. bugs were hard to recognize) so we are trying to stick with a readable version that can be compared to the student code easily.

tehsphinx commented 2 years ago

Yeah this is a hard one I'm not yet clear on.

I think instead of using the type of the node and length of content we could use the type of the entire node tree (including sub types). This is also best done after the normalization steps (except the naming unification which should be done after the ordering).

junedev commented 1 year ago

@tehsphinx I don't understand what you mean by "type of the entire node tree (including sub types)". Can you clarify?

I think the most important thing is that top-level exported functions are never reordered because those correspond to the tasks. In a first approximation, we could achieve this by never reordering functions at all (no second criteria and use stable sort). A better version of this would be to keep the order for exported functions and put all the unexported ones below all the exported ones. Maybe the unexported ones can then be sorted by length again but length after renaming to avoid different orders depending on choosen names.

(Sidenote: If we wanted to recommend top-down sorting of functions to the student at some point, that is something the analyzer could do.)

tehsphinx commented 1 year ago

If we don't sort, we get a lot more variations for the representer. That's the reason I added it, because it shouldn't matter if the type was defined before the function or after or if an additional helper function was added on top. At least not for the representer.

How does it work now?

Sort by type: "import", "type", "const", "var", "*ast.FuncDecl" (in that order)
Sort by code length

Problem: if 1. and 2. are equal for 2 elements, we have a random outcome.

My proposal

Sort by type: "import", "type", "const", "var", "*ast.FuncDecl"
Sort by type of AST children until they are not equal
Looking for additional ideas if e.g. 2 functions have the exact same AST tree types.

tehsphinx commented 1 year ago

Note: here is the current sorting algorithm: https://github.com/exercism/go-representer/blob/main/representation/sorting.go

junedev commented 1 year ago

If we don't sort, we get a lot more variations for the representer.

I never said anything about not sorting. 🤔 I said let's sort everything but not change the order of the exported functions.

tehsphinx commented 1 year ago

@junedev No worries 🙂

Was just trying to explain the situation as a whole including why I think sorting is important in the first place. 😊

exercism / go-representer

Improve logic used for sorting top-level expressions #27