Normalize struct names - Githubissues

jiegillet commented 3 years ago

This is mentioned in #29.

Struct names should be replaced by placeholders as well.

Example input

defmodule Something do
  defstruct [:a, :b]
  def something(%Something{a: a}), do: %Something{a: 2 * a}
  def something_else(%__MODULE__{a: a}), do: %__MODULE__{a: 3 * a}
end

Desired output

defmodule(PLACEHOLDER_1) do
  defstruct([:a, :b])
  def(PLACEHOLDER_2(%PLACEHOLDER_1{a: PLACEHOLDER_3})) do
    %PLACEHOLDER_1{a: 2 * PLACEHOLDER_3}
  end
  def(PLACEHOLDER_4(%__MODULE__{a: PLACEHOLDER_3})) do
    %__MODULE__{a: 3 * PLACEHOLDER_3}
  end
end

I think __MODULE__ should not be replaced, because it means different things in different modules. It's already normalized, in a way...

Current output

defmodule(PLACEHOLDER_1) do
  defstruct([:a, :b])
  def(PLACEHOLDER_2(%Something{a: PLACEHOLDER_3})) do
    %Something{a: 2 * PLACEHOLDER_3}
  end
  def(PLACEHOLDER_4(%__MODULE__{a: PLACEHOLDER_3})) do
    %__MODULE__{a: 3 * PLACEHOLDER_3}
  end
end

butterknight commented 2 years ago

Hi! 👋 I tried playing around with this in an effort to both learn Elixir and try to make something productive.

I created a new folder inside test_data with:

input.ex

defmodule Something do
  defstruct [:a, :b]
  def something(%Something{a: a}), do: %Something{a: 2 * a}
  def something_else(%__MODULE__{a: a}), do: %__MODULE__{a: 3 * a}
end

expected_representation.txt

defmodule Placeholder_1 do
  defstruct [:a, :b]

  def placeholder_2(%Placeholder_1{a: placeholder_3}) do
    %Placeholder_1{a: 2 * placeholder_3}
  end

  def placeholder_4(%__MODULE__{a: placeholder_3}) do
    %__MODULE__{a: 3 * placeholder_3}
  end
end

expected_mapping.json

{
  "Placeholder_1": "Something",
  "placeholder_2": "something",
  "placeholder_3": "a",
  "placeholder_4": "something_else"
}

And I wanted to see the current result so I ran the test I got this:

If I read this correctly, it would appear that the struct name is already properly normalised, but the key wasn't. Did I do something wrong or is this already partially implemented?

angelikatyborska commented 2 years ago

Hmm... @jiegillet is it possible that we accidentally implemented this in https://github.com/exercism/elixir-representer/pull/44 ?

jiegillet commented 2 years ago

Yes, it's entirely possible :) It happened when we normalized module names, the struct names followed along. I didn't close this issue because of the key behavior.

I don't necessarily think that we should normalize all map keys indiscriminately, but we certainly can do it for the ones in defstruct because they are user defined. In that sense the current behavior is a step in the right direction and what is missing is

defstruct [:placeholder_1, :placeholder_2] # or whatever numbers

@angelikatyborska what do you think? I also suspect that #44 may have introduced some undesirable behavior with key replacement, so I would suggest to test the representer on some larger codebase that uses maps in various ways...

Quick example: in import A, only: [function_a: 1] we don't want to touch only or function_a. Right now I am not confident that these are safe if we have a only variable somewhere else being normalized.

angelikatyborska commented 2 years ago

I don't necessarily think that we should normalize all map keys indiscriminately

I'm having a hard time coming up with a good rule about what to normalize when with regard to map, struct, and keyword list keys.

In general, it seems to me like key names are just like variable names and should be normalized. But then I start coming up with a bunch of exceptions: keys in structs that come from the standard library are always known and shouldn't be normalized. Keys in some special usage keyword lists like import A, only: ... or IO.inspect(x, label: "x"). I feel like trying to chase each exception will drive us crazy and I would much rather go all in or all out - normalize all of the keys or none of the keys. I'm also curious what other languages do with similar problems.

I'm currently writing a bunch of more tests that just pass to highlight the current behavior: https://github.com/exercism/elixir-representer/pull/73/files

For example struct names are only kept if they're not user defined, which is great. And you're right about this bug:

Right now I am not confident that these are safe if we have a only variable somewhere else being normalized.

angelikatyborska commented 2 years ago

@iHiD advised on Slack:

About representers: if you have key-value data structures, should you replace key names with placeholders?

If they don't matter to the implementation of the exercise (which as far as I can tell they shouldn't otherwise the tests would fail), then yes.

iHiD commented 2 years ago

Quick example: in import A, only: [function_a: 1] we don't want to touch only or function_a. Right now I am not confident that these are safe if we have a only variable somewhere else being normalized.

I know this isn't the point of this discussion, but removing import statements entirely is potentially quite a good idea. (F# does this: https://exercism.org/docs/tracks/fsharp/representer-normalizations#h-remove-import-declarations)

angelikatyborska commented 2 years ago

@iHiD What's the reasoning behind removing imports? I suspect they might not apply in Elixir.

iHiD commented 2 years ago

I'd sort of flip the question and ask what's the reason for keeping them?

The solution being normalised passes the tests, so it's importing whatever it needs to import. So the question becomes are they things you'd ever want to give feedback on?

If they are, then it's good to to keep them. (e.g. if you want to comment when someone imports more than needed (e.g. missing only: [...]) or imports something but doesn't use it).
If they're not, then getting rid of them is just less potential difference in the representations (e.g. if someone imports A then B, and another student imports B then A, the placeholders will end up entirely different and so the representations will not normalise).

My instinct is that they're not things that we need to give feedback on automatically, so increasingly the likelihood of normalisation feels better.

angelikatyborska commented 2 years ago

So the question becomes are they things you'd ever want to give feedback on?

Most of the time, yes.

I can see those scenarios for imports in Exercism exercises:

Somebody is doing the one learning exercise where importing a module given in the boilerplate is expected and required. Here giving feedback on the wrong usage of :only vs :except would be good.
Somebody is trying to import something from the standard library in another exercise, where I would want to tell them to stop. The whole standard library is available in Elixir without doing any imports by referencing functions with module names, which is the expected way of using the standard library.
Somebody is trying to import their own module that they submitted as part of the solution. This means that their solution is already unique enough that the extra normalization won't help much - most solutions are a single module.

jiegillet commented 2 years ago

e.g. if someone imports A then B, and another student imports B then A, the placeholders will end up entirely different and so the representations will not normalise

That's a good idea, let's order imports :)

angelikatyborska commented 2 years ago

@jiegillet I take your comment to mean that you agree that imports should stay in the representations 😁 I created an issue about ordering them: https://github.com/exercism/elixir-representer/issues/75

iHiD commented 2 years ago

I can see those scenarios for imports in Exercism exercises:

While those are solid examples, all three of them feel like they could be really good analyzer checks that would cover all exercises and then allow the representations to be further normalised and then reduce the amount of feedback given manually in the automation UI.

But I'm entirely happy to defer to whatever you decide - you know the range of possibilities in Elixir much better than me! 🙂

exercism / elixir-representer

Normalize struct names #37

Example input

Desired output

Current output