RFC: proposal to add assertions for list of map-likes

idlehands commented 8 months ago

I have been playing around with the idea of extending CheckerCab to use the same assertion system for lists of map-likes (map, struct, ecto schema). Overall, I don't think it's too hard, but coming up with a clear API is the challenge. I have thrown out a couple passes on what it could look like, but am open to new ideas as well as different terms.

option 1 - new function

The main question is whether or not it should be a new function or an extension of the behavior of the existing function. I'm leaning towards a new one, though naming, as usual, is hard.

      assert_values_for_in_list(
        expected_list: {[thing_1, thing_1], :string_keys},
        actual_list: return_value, # is list in all the examples
        ordered: true,
        fields: fields_for(Thing)
      )

      assert_values_for_in_list(
        expected_list: [thing_2, thing_1],
        actual_list: return_value,
        ordered: false,
        identifier_key: :id, # this would have to be required when ordered is false
        fields: fields_for(Thing)
      )

option 2 - add an optional key to existing function

Alternatively, we could add a new key to the existing function. I'm concerned that this might leave it hard to distinguish the behavior.

      assert_values_for(
        type: :ordered_list, # default would be something like :single_map, leaving this backwards compatible
        expected: [thing_1, thing_2],
        actual: return_value,
        fields: fields_for(Thing)
      )

      assert_values_for(
        type: :unordered_list,
        unique_identifier: :id,
        expected: [thing_2, thing_1],
        actual: return_value,
        fields: fields_for(Thing)
      )

I would love feedback and any other suggestions.

idlehands commented 8 months ago

I have requested @GeoffreyPS also weigh in.

GeoffreyPS commented 8 months ago

On its face I don't think the library should be extended in this way -- it seems like an invitation to complexity with less gain. I'm keeping an open mind about this, but I don't think the juice is worth the squeeze here.

Can the caller avoid need for this feature?

What circumstances would a caller have two lists of structs to compare that they could not either:

simplify on a more basic level to a single comparison
wrap this function in their own list comprehension to get the functionality of an ordered list

Suggestions and call-outs if this feature is developed

Ordering with `Enum.sort_by/3`

For unordered lists, instead of a unique_identifier could it be open to a caller to pass in a sorting function? We could provide a mapper function (the default being a sort by id, :asc). It would be a familiar API, ease implementation here, and make it a little more flexible for developers who find themselves needing that.

Difficulties with surfacing errors

If an assertion error occurs deep in a list of comparisons, how would one expose exactly which map-like broke the test? This may be more difficult to implement with functions on sort_by/3 as suggested above and may become easier if the API only exposes unique_identifier. This is probably the main place where I think having dedicated library functionality could be a desirable improvement over a hand-rolled solution.

idlehands commented 7 months ago

@GeoffreyPS thanks for pushing back on the idea. It's very easy to want to bloat the library with every possible use case. That said, I still think this might have a place in the library. Here are my thoughts:

The most important change that we made when we upgraded assert_values_for from being a code snippet that was copied around to a library was that we changed it to stop breaking at the first mismatched value, preferring to instead aggregate all of the information and then present it in a comprehensive way. This allows the error to convey everything needed to understand the issue at the first failure instead of forcing someone to work all the way through each mismatched field.

Before I wrote this RFC, I actually played around with a locally implemented wrapper. I honestly thought I had included it in this RFC, but I was jumping around and apparently totally failed to do so. Here is what I did:


    defp assert_values_for_in_list(opts) do
      expected_list = opts |> Keyword.get(:expected_list)
      actual_list = Keyword.get(opts, :actual_list)

      assert length(expected_list) == length(expected_list) 

      fields = Keyword.get(opts, :fields)
      ordered? = Keyword.get(opts, :ordered, true)
      identifier_key = Keyword.get(opts, :identifier_key)

      for {expected, index} <- Enum.with_index(expected_list) do
        actual =
          if ordered? do
            Enum.at(actual_list, index)
          else
            Enum.find(actual_list, fn actual -> Map.get(actual, identifier_key) == Map.get(expected, identifier_key) end)
          end

        assert_values_for(
          expected: expected,
          actual: actual,
          fields: fields
        )
      end
    end

It is for the ordered version of the lists. I would have left well enough alone, using this as a local assertion helper, copying it into codebases as needed, but my use case highlighted right away that there are two ways that a list of things can fail: the order is wrong OR the data is mostly correct but individual fields aren't correct.

With the above code, the the assertions stop as when there is a mismatch and only return the information about the the two map-likes that didn't match, removing any other context.

With the unordered version, there is different, but just as relevant context lost.

Possible Alternative Approach While writing this, though, I did realize that there is an intermediate step that could be taken: we could add an optional param that tells assert_values_for/1 to return the ExUnit.AssertionError instead of raising it. It would default to current behavior of just raising. This would allow a custom wrapper to get all of the information and present it. Admittedly, it will be REALLY verbose. This might allow development of additional features in individual codebases without needing to extend the library. Kind of like the original function, perhaps this will help with experimentation and we may end up with something already well defined to add if we decide to.

On that note, it might be worth considering creating a new error that stores the data more granularly (it's done that way before converting to the message currently) and then has a message function that returns things as currently formatted. But I think this would be a nice evolution, not the starting point.

I'm glad you pushed back, @GeoffreyPS! I think this is a discussion worth having.

GeoffreyPS commented 7 months ago

I see what you mean now in terms of different fail states depending on whether the input list is sorted/unsorted and how the library could better return meaningful errors without need of repeating the test iteratively to suss out what the underlying error truly is. The simplifying solution I suggested--wrapping in a list comprehension and having the caller manage this--still does fall prey to this issue (so maybe simplify tests? 😉).

I think if this library is intended to make complex assertions possible and more understandable, then reducing context lost here makes more sense to me than it did when I first gave the idea consideration.

I really like the idea of providing an option to return the ExUnit error instead of raising. I think this could provides a lot of flexibility on the library and could potentially make the output implementation a bit easier to maintain. It does change the way I see how this library can be used though. Instead of asking "are these two things functionally the same?" we can also ask "what does the presence of these assertion errors mean?"

idlehands commented 7 months ago

@GeoffreyPS I think we can be a little hand-wavy about how we see the library. perhaps we mark this as a alpha/beta feature or don't highly publicize it until we are sure that it really is the direction for things.

idlehands commented 7 months ago

@lmarlow I think we've agreed to start with adding the ability to return errors instead of raising them. Do you want any input on this before we head in that direction?

Shimmur / checker_cab