New common check: Helper functions should be private

jiegillet commented 2 years ago

Closes #201. This PR ended up being kind of huge, sorry about that.

The purpose of the new common check is to see if helper functions have been mistakenly defined as public functions. To do this, we can check again the exemploid (yes, I ended up using that word). So far, we could only read concept exercise exemplars, so I had to include practice exercises too. Then one thing lead to another:

I added a %Source{} struct that holds all the exercise input (solution, exemploid, slug, paths...)
I changed the analyzer to read all that data before analysis and pass it all in an argument + added tests (in test_data for missing files etc)
I added the check for public helper functions and tests (~~I could only use test_data since we need an exemploid~~)
I fixed a bunch of other tests that had public helpers
I modified ExerciseTestCase so it could figure out the source info from the module name
I fixed a small bug: the compiler warnings showed the compile file as nofile and that always bugged me. Now that the common checks have access to %Source{} it was easy to add the correct file name.

Left to do

[x] website-copy comment
[x] Add tests for public helper functions, concept and practice.

I changed my mind on check_source, I think I would now prefer passing it the %Source{} so that users have access to everything. I will probably do that in a different PR, because this one is big enough. I also have to add docs.

angelikatyborska commented 2 years ago

The bigger the PR, the longer the wait until I have enough brain power for a review 🙈

jiegillet commented 2 years ago

The bigger the PR, the longer the wait until I have enough brain power for a review 🙈

Same here, it's all good, take your time this isn't urgent.

angelikatyborska commented 2 years ago

Something is weird with CI. I tried merging main to see if it helps but it didn't:

== Compilation error in file test/elixir_analyzer/exercise_test/assert_call/erlang_modules_test.exs ==
** (File.Error) could not list directory "elixir/exercises/concept": no such file or directory
    (elixir 1.12.1) lib/file.ex:1590: File.ls!/1
    (elixir_analyzer 0.1.0) test/support/exercise_test_case.ex:181: ElixirAnalyzer.ExerciseTestCase.find_source_type/1
    (elixir_analyzer 0.1.0) test/support/exercise_test_case.ex:166: ElixirAnalyzer.ExerciseTestCase.find_source/1
    test/elixir_analyzer/exercise_test/assert_call/erlang_modules_test.exs:2: (module)
    (stdlib 3.15) erl_eval.erl:685: :erl_eval.do_apply/6
    (elixir 1.12.1) lib/kernel/parallel_compiler.ex:428: Kernel.ParallelCompiler.require_file/2
    (elixir 1.12.1) lib/kernel/parallel_compiler.ex:321: anonymous fn/4 in Kernel.ParallelCompiler.spawn_workers/7

jiegillet commented 2 years ago

Something is weird with CI. I tried merging main to see if it helps but it didn't:

I rebased everything on main, and it seems to work. Or was it you? 🤷‍♂️

angelikatyborska commented 2 years ago

@jiegillet It doesn't work, see failing external tests: https://github.com/exercism/elixir-analyzer/runs/4065745282?check_suite_focus=true

jiegillet commented 2 years ago

Oh you're right. I changed the CI for the internal tests, but I forgot the external tests. I guess they external tests don't need the submodule, but compilation does. It wasn't an issue so far because if the concept exercise folder wasn't found, it assumed it was a practice exercise and moved on. However now it's required.

jiegillet commented 2 years ago

OK. External tests now fail successfully.

jiegillet commented 2 years ago

For this check, we are checking if the student solution has the same public interface as the exemploids, although we are telling them to have the same interface as the stub. So I tried checking if the exemploids actually had the same public interface as the stubs and... They do not :p I ran PrivateHelperFunctions.run(exemploid_ast, stub_ast) on every exercise and found some issues: Concepts exercises: dna-encoding (def do_decode(a, b)), bread-and-potions (def eat(a, b)), stack-underflow (def divide(a)), lasagna (def alarm()). The problem for the last 3 is that the stub doesn't have any public functions. Practice exercises... So many (26), including a lot of my contributions, I guess I didn't think about this at all... "rotational-cipher", "pov", "two-bucket", "book-store", "variable-length-quantity", "binary-search-tree", "square-root", "affine-cipher", "sgf-parsing", "go-counting", "dominoes", "rectangles", "pythagorean-triplet", "satellite", "house", "circular-buffer", "bank-account", "forth", "react", "clock", "zebra-puzzle", "ocr-numbers", "simple-linked-list", "word-search", "food-chain", "parallel-letter-frequency". Let's open an issue on elixir after this gets merged.

angelikatyborska commented 2 years ago

we are telling them to have the same interface as the stub.

The comment (which I already merged 🤦) needs to be changed then. I think it would be more correct to say that the public interface is already defined by the tests, and then maybe add that for most exercises it's also the stub? But not all. As you have noticed, there are concept exercises where the stubs don't include function definitions on purpose.

dna-encoding (def do_decode(a, b))

That's the only concept exercise that actually needs fixing

Practice exercises... So many (26)

Oh no 🙈 should we maybe do a switch and use the stub for comparisons for practice exercises?

jiegillet commented 2 years ago

The comment (which I already merged 🤦) needs to be changed then. I think it would be more correct to say that the public interface is already defined by the tests, and then maybe add that for most exercises it's also the stub? But not all. As you have noticed, there are concept exercises where the stubs don't include function definitions on purpose.

OK, I will change the comment. I apologize, I should have done this check sooner.

Practice exercises... So many (26)

Oh no 🙈 should we maybe do a switch and use the stub for comparisons for practice exercises?

I don't think we should compare it to the stubs, because that would mean comparing it to something that is in the submodule and not guaranteed to be in sync with the student's version, however unlikely it is to change. So instead I went ahead and grinded through all 26 examples and fixed them. I apologize again for submitting another huge (although simple) PR.

angelikatyborska commented 2 years ago

I don't think we should compare it to the stubs, because that would mean comparing it to something that is in the submodule and not guaranteed to be in sync with the student's version,

Right, very good argument that completely skipped my mind this morning :)

angelikatyborska commented 2 years ago

I opened a PR with the comment change because I forgot you said you will do it :facepalm: https://github.com/exercism/website-copy/pull/2118

I left two final suggestions about the new check. Everything else looks great. I didn't review the refactoring with my regular care and attention to detail - I trust the tests to catch bugs and I trust you that it makes sense conceptually because by now you know this project better than I do 😁

angelikatyborska commented 2 years ago

Ok, after reviewing the elixir repo PR, I am now wondering what about extra modules like this:

defmodule Rules do
  defmodule BooleanLogic do
    def do_or(left, right), do: left or right
  end

  def score?(touching_power_pellet, touching_dot) do
    BooleanLogic.do_or(touching_power_pellet, touching_dot)
  end
end

This will trigger a comment. So we will effectively complaining about any multi-module solution 🤔

jiegillet commented 2 years ago

Ok, after reviewing the elixir repo PR, I am now wondering what about extra modules like this:
defmodule Rules do
  defmodule BooleanLogic do
    def do_or(left, right), do: left or right
  end

  def score?(touching_power_pellet, touching_dot) do
    BooleanLogic.do_or(touching_power_pellet, touching_dot)
  end
end
This will trigger a comment. So we will effectively complaining about any multi-module solution 🤔

That's true. How about then I detect the @doc false trick in my implementation? We already comment about that trick in the analyzer comment, we could change it slightly to mention that this is a viable option if they want to have multi-module solutions. I haven't reviewed your website-copy PR yet, in case we want to do this.

jiegillet commented 2 years ago

I ended up hiding function behind @doc false and @impl true. If you think it's not necessary, I can revert the last commit.

neenjaw commented 2 years ago

Ok, I'm missing some details about this PR.

Why does @doc false and @impl true hide functions?

Doesn't this pr only check to make sure that the solution module only has the public function required by the stub be public, the rest private? (I haven't inspected the details, but that is the issue this pr intends to close, correct?)

Why would we also throw a warning about a sub module?

jiegillet commented 2 years ago

Doesn't this pr only check to make sure that the solution module only has the public function required by the stub be public, the rest private? (I haven't inspected the details, but that is the issue this pr intends to close, correct?)

Yes, that is the intent, but read on.

Why does @doc false and @impl true hide functions?

The basic rule is that the students should only expose functions required in the tests, and everything else should be private. However in general, there are 2 situations where that's not possible:

When using GenServer, they need to implement functions like init which are not in the tests but cannot be defined privately. However, if those functions have @impl true before, it signals that this is an implementation of another module, and effectively hides the function from the public interface as well.
When using different modules, some functions must be public to be used across modules, so to hide them from the docs and the public interface you use @doc false.

I basically want to give students the same options. For the first point, the exemplar/example (which is what we use as a reference) also uses those functions, so it would be OK, but for the second point, if we want to let users define their own modules, we need to give them a way to do that without triggering the check, so we need to let then know about @doc false and take it into consideration.

Why would we also throw a warning about a sub module?

I'm not sure what you are referring to here, we have no plans of telling students about the sub module.

neenjaw commented 2 years ago

Why does @doc false and @impl true hide functions?

The basic rule is that the students should only expose functions required in the tests, and everything else should be private. However in general, there are 2 situations where that's not possible:

When using GenServer, they need to implement functions like init which are not in the tests but cannot be defined privately. However, if those functions have @impl true before, it signals that this is an implementation of another module, and effectively hides the function from the public interface as well.

If GenServer is the only case that we are aware of in a main solution module that this may be true, why not just make exceptions for functions named to match the GenServer callback functions?

While @impl true is recommended and will throw a warning if one exists and others are missing, it is not always required.

When using different modules, some functions must be public to be used across modules, so to hide them from the docs and the public interface you use @doc false.

I basically want to give students the same options. For the first point, the exemplar/example (which is what we use as a reference) also uses those functions, so it would be OK, but for the second point, if we want to let users define their own modules, we need to give them a way to do that without triggering the check, so we need to let then know about @doc false and take it into consideration.

So even in your discussion above, you called it the @doc false trick, while this prevents it from appearing in ExDoc-like instances, it's not really hidden from the public interface.

I'm not even sure this is a good suggestion to use @doc false on sub modules because it may suggest to students that any sub module should not have documentation.

It is an unfortunate/fortunate consequence of Modules always having public visibility in Elixir, so I think that's why we should try to teach good habit organizing code around contexts rather than tricks to avoid our false positive warning.

Why would we also throw a warning about a sub module?

I'm not sure what you are referring to here, we have no plans of telling students about the sub module.

I'm referring to the exchange that ends with your comment here: https://github.com/exercism/elixir-analyzer/pull/207#issuecomment-958584617 where submodules will raise a warning unless covered by the trick.

I think I would suggest that this public/private check only exist for the main solution module with exceptions for known public function which may be implemented broadly (e.g. GenServer callback named functions) rather than for any module which I think our authority to appropriately know the scope of functions is weaker.

jiegillet commented 2 years ago

Ah, I see, I confused Elixir sub-modules with the git submodule. Those things only have the name in common :)

For the GenServer case, it should not problem at all to drop the @impl true check, since the reference implementation would use those same functions as well.

So you are suggesting that we do not check other modules at all. It certainly can be done, but kind of undermines the message of the check that is use as little public functions as possible. As you said it's a consequence of having always-public modules.

neenjaw commented 2 years ago

So you are suggesting that we do not check other modules at all. It certainly can be done, but kind of undermines the message of the check that is use as little public functions as possible. As you said it's a consequence of having always-public modules.

Yes, that is my suggestion, not because the principle doesn't stand or because it isn't a good practice, but because our ability to differentiate required interface from private helper in unknown, student written, modules is probably not trivial -- and in those situations I think it is best left to the mentor rather than raising a false positive comment or an innocent, but misleading, comment to avoid our false-positive warning.

If you can introspect to keep track of what calls are made from the solution module to student written auxiliary modules, then perhaps you can be more certain to raise the comment only when there exists a public function, not called by an external module which then there is a stronger indication that it should be private because it is only used internally.

jiegillet commented 2 years ago

If you can introspect to keep track of what calls are made from the solution module to student written auxiliary modules, then perhaps you can be more certain to raise the comment only when there exists a public function, not called by an external module which then there is a stronger indication that it should be private because it is only used internally.

Nope, not doing that, it's not worth it 😅

jiegillet commented 2 years ago

I've implemented your suggestion in the last commit..

neenjaw commented 2 years ago

mid-review

neenjaw commented 2 years ago

Changes look reasonable, adding this one test seemed to impact a lot of test files, which is interesting. I think that while this is a more conservative approach, it will yield less false-positive warnings.

I'll leave the commnt copy and merge to you and @angelikatyborska since she is code-owner and has to approve to merge

jiegillet commented 2 years ago

Thanks for the review and the suggestions, I appreciate it.

exercism / elixir-analyzer

New common check: Helper functions should be private #207