dotnet / EntityFramework.Docs

Documentation for Entity Framework Core and Entity Framework 6
https://docs.microsoft.com/ef/
Creative Commons Attribution 4.0 International
1.63k stars 1.96k forks source link

Provide more details on unit testing #2971

Closed kevingy closed 2 years ago

kevingy commented 3 years ago

One of the goals of a good unit test is to isolate external dependencies. The external dependency in this case is the database/EF Core. When I'm testing my controller or service that consumes the database, I should be able to test ONLY my code, not unit test the database/EF Core. These three examples break that fundamental principle by introducing a database into the unit tests and then expressing that it is correct and appropriate to do so.

The documentation says not to attempt to mock the DbContext or IQueryable because it is difficult and fragile. Trying to manage the state of the database while running concurrent unit tests and having to return the database back to a known state between test runs is, in my experience, MORE difficult, fragile, and error prone, assuming its possible at all in a CI/CD environment where the unit tests run may not be able to access a database.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ajcvickers commented 3 years ago

@kevingy It sounds like you have some experience mocking async LINQ. Would you mind sharing some guidance on the approach you take to do this, preferably with some examples? If there is a reliable, clean solution, then we would certainly consider including it in the docs.

That aside, I agree that unit tests should not access the database. Feel free to test any code you want in unit tests without hitting the database, including defining a interface for your use of DbContext, or maybe a repository interface.

However, my experience has been that even doing all that unit testing does not remove the need to test your code with a real database. This is because abstracting the behavior of a database in a test double is very difficult, since database behaviors are complex and bugs often happen because the database is not behaving in the way you expect it will. The same is also true for abstracting EF Core. So feel free to write unit tests and if that's enough for your code, then that's good. For most people, I think that would lead to many bugs.

kevingy commented 3 years ago

@ajcvickers I typically wrap the DbContext using the unit or work pattern or repository pattern, depending on the project. In a project I'm currently working on, I'm experimenting with just implementing an IDbContext interface, based on a blog by Gunnar Peipman (https://gunnarpeipman.com/ef-core-repository-unit-of-work/ - the URL is poorly named, as the post actually is anti-repository and anti-UoW). I'm liking the IDbContext approach. I can provide some example code of any of the above approaches in the context of what is already on the documented page, if you believe those would be useful.

To be clear, I agree with that is documented in that mocking DbContext is difficult, if not impossible. I wouldn't suggest or attempt that.

Additionally, I agree that testing with a database is vital. I wouldn't fell comfortable deploying code without that testing. The only difference is what that testing is called. Because an external dependency is required, testing with a database is, integration testing rather than unit testing. Unit tests run on a developer desktop and/or a CI/CD pipeline without an integration environment. Integration tests require an integration environment where the database will exist.

What is documented on the page is odd, in that unit testing is mentioned almost as an after-thought at the bottom, when in reality, unit testing comes before integration testing. Almost the entire page is integration testing with a passing mention of unit testing.

ajcvickers commented 3 years ago

What is documented on the page is odd, in that unit testing is mentioned almost as an after-thought at the bottom, when in reality, unit testing comes before integration testing. Almost the entire page is integration testing with a passing mention of unit testing.

That's a fair point, although I think that's primarily because unit testing doesn't usually involve EF Core other than to abstract it. We could show an example of creating an IDbContext interface and mocking that. However, the problem is always, "What about DbSets, which are IQueryable? How do I write a LINQ query and then mock in my context." We covered this for EF6 here: https://docs.microsoft.com/en-us/ef/ef6/fundamentals/testing/mocking, but it's ugly and fragile and hasn't been updated for EF Core.

If I were going to do this, I would not expose my queries as IQueryables, but rather write a real repository and expose conceptual queries on that as methods--for example:

IEnumerable<Customer> GetCustomersAndOrders();

Now because this returns an IEnumerable, not an IQueryable, it is easy to mock.

(I wouldn't actually do this because I believe this level of abstraction is overkill for most applications. But if I felt there was a need to do this kind of testing, then this is how I would do it.)

ajcvickers commented 3 years ago

@kevingy Did you make any progress finding simple patterns for mocking async IQueryable?

kevingy commented 3 years ago

@ajcvickers Sorry for taking so long to reply. I had a deadline this afternoon. Here is what I'm currently doing:

First, I wrap the DbContext concrete class in an interface following the Gunnar Peipman blog referenced above.

    /// <summary>
    /// Defines an interface for a telephone database context.
    /// </summary>
    public interface IDbContext
    {
        /// <summary>
        /// Gets or sets a database set of <see cref="ApiKey"/>.
        /// </summary>
        DbSet<ApiKey> ApiKeys { get; set; }
...

Second, in the class that consumes the context, rather than injecting a concrete DbContext instance, I depend on the IDbContext interface.

        public ApiKeyController(
            IDbContext context)
        {
            this.context = context;
        }

Third, in the unit tests of that class, I NSubstitute the IDbContext interface and substitute each DbSet the test needs using the MockQueryable NuGet library. There are many MockQueryable packages that can be used, depending on the application. I use the one for NSubstitute. (See https://www.nuget.org/packages?q=MockQueryable .)

    [TestFixture]
    public class ApiKeyControllerTests
    {
        private readonly List<ApiKey> apiKeys = new List<ApiKey>()
        {
            new ApiKey { Id = Guid.Parse("52a86301-c037-40ac-ac7e-661d774a7140"), CreatedDate = DateTime.Now, UpdatedDate = DateTime.Now, Key = Guid.Parse("087d0412-18ad-4386-96c3-c31d7f83b961") },
            new ApiKey { Id = Guid.Parse("f1093b05-4020-46ea-b97e-ec135c049e40"), CreatedDate = DateTime.Now, UpdatedDate = DateTime.Now, Key = Guid.Parse("a398c638-8dc7-4a73-9d92-24a508dbaad5") },
            new ApiKey { Id = Guid.Parse("ad58f8b7-ee5c-4c12-9530-cb0e71dc611f"), CreatedDate = DateTime.Now, UpdatedDate = DateTime.Now, Key = Guid.Parse("949719bf-c0ab-4f26-a289-ad45c8f75bc9") },
        };

        [OneTimeSetUp]
        public void Setup()
        {
            this.context = Substitute.For<ITelephoneDbContext>();
            this.context.ApiKeys = this.apiKeys.AsQueryable().BuildMockDbSet();
...

Lastly, in each test, I'm able to use the mocked context DbSet as needed. In this example, when the controller that has a reference to this.context calls IDbContext.ApiKeys.Add, it adds the ApiKey entity to the this.apiKeys collection that is a member of the ApiKeyControllerTests class rather than reaching out to EntityFramework and a database.

        [Test]
        public void GetApiKeys_Invoked_ReturnsApiKeys()
        {
            // Arrange
            this.context.ApiKeys.Add(Arg.Any<ApiKey>())
                .Returns(apiKey => null)
                .AndDoes(apiKey => this.apiKeys.Add(apiKey.Arg<ApiKey>()));

Again, while this approach allows unit testing of classes that consume a DbContext without directly accessing a database, this unit testing strategy should not replace or be considered a substitute for integration testing that does access a database.

roji commented 3 years ago

@kevingy I'm looking at improving our docs on testing, taking the above into account. Assuming you're still around and interested, do you have a strategy for dealing with cases where simple evaluation against an in-memory List<T> cannot work, since a server-only method is being used in the query (e.g. EF.Functions.Like), or FromSqlRaw is used? Or cases where the in-memory results differ from the database results (e.g. case-sensitivity in string comparisons)?

This sort of stuff is why we usually recommend a repository wrapper around queries, rather than mocking the DbSets themselves; since the wrapping methods always return IEnumerable (as @ajcvickers wrote above), a mock can return the desired end result directly, rather than relying on in-memory evaluation.

kevingy commented 3 years ago

Hi @roji I'm happy to help.

My initial comment about the documentation was that it is almost entirely focused on integration testing, not unit testing. a fundamental requirement for unit testing is that all external dependencies, including the database, are isolated and abstracted away. Your specific examples (EF.Functions.Like and FromSqlRaw) require a functional database to function. That being the case, the return values from these methods should be mocked and injected. Otherwise, the test is including the database interaction, meaning that the test is an integration test, not a unit test. Code accessing the dependency (EF.Functions.Like and FromSqlRaw) must access an abstracted dependency to be a valid unit test.

Concerning List, can you give me a code example of the scenarios you are talking about? It's difficult to envision the scenario without some context. Regarding in-memory results vs database results, this scenario is not a unit test concern. This is an integration test concern. If the unit test is written properly, no database is involved for there to be a difference between the results.

The Repository pattern may be a fine alternative, if it accomplishes the project's intended goals - including, but not necessarily limited to, testing. As I mentioned above, I often use either the Repository or Unit of Work patterns for testing. It was not my intention to say that the DbSet can be mocked. As in most things, a competent engineer should use the right tool for the job rather than hitting everything that looks like a nail with the hammer they have used for every other job.

roji commented 3 years ago

My initial comment about the documentation was that it is almost entirely focused on integration testing, not unit testing. a fundamental requirement for unit testing is that all external dependencies, including the database, are isolated and abstracted away. Your specific examples (EF.Functions.Like and FromSqlRaw) require a functional database to function. That being the case, the return values from these methods should be mocked and injected. Otherwise, the test is including the database interaction, meaning that the test is an integration test, not a unit test. Code accessing the dependency (EF.Functions.Like and FromSqlRaw) must access an abstracted dependency to be a valid unit test.

I agree that ours docs should include more information on unit testing, and part of my work will definitely include that.

The way I see it, generally speaking unit tests approaches can be split into two:

  1. Mocking the query input data. This involves mocking the DbSets (roughly, the database tables) via in-memory data (e.g. List<T>), and evaluating LINQ queries against them, in .NET. This is what your sample above with NSubstitute/MockQueryable does (if I understood correctly), and this is what the EF Core InMemory provider does as well.
  2. Mocking the query outputs/results. This doesn't run any query operators in-memory, . This can typically only be done via the repository approach, which introduces an architectural constraint on how you code; LINQ queries need to be wrapped by methods returning IEnumerable, and those methods are the ones which are mocked.

We're generally skeptical of approach 1; "mocking the database" doesn't mean just replacing raw tables with in-memory data, since databases are also responsible for evaluating and executing actual query code (expressed in SQL). For example, we regularly get requests to add support in the InMemory provider for this or that SQL Server method (e.g. anything in SqlServerDbFunctionsExtensions). I'm not even sure how one goes about mocking queries like that with NSubstitute/MockQueryable - the call to these provider-specific methods is embedded somewhere deep in the query tree, and simply cannot be evaluated in .NET. To be successful with approach 1, your code really must be restricted to the bare minimum of query operators/constructs which happen to work in .NET.

(Ironically enough, approach 1 is only possible with a LINQ provider such as EF Core; if you write raw SQL, approach 2 is the only thing you can do anyway, since evaluating query operators cilent-side simply isn't an option).

Would be good to get your opinions on the above!

kevingy commented 3 years ago

I see value in both approaches - 1 and 2. While I suggested Mocking the query input data, there is also value in Mocking the query outputs/results. The two approaches are not mutually exclusive and both have value. I have used both, although not in the same project. Again, I use the right tool for the job.

That said, usage of the Repository or UoW pattern has side effects in that anything (e.g. controllers and services) that accesses the database will go through the repositories. That is, if the Repository or UoW pattern is used, the application being tested will have code for the Repository or UoW pattern in it. That's not a problem, but if the Repository or UoW pattern is implemented for the sole purpose of unit or integration testing, it's probably not the best choice because it is not preferable to have testing code in the tested project.

We're generally skeptical of approach 1; "mocking the database" doesn't mean just replacing raw tables with in-memory data, since databases are also responsible for evaluating and executing actual query code (expressed in SQL).

I can understand how the Entity Framework team would be skeptical of approach 1. It's essentially saying "unit test your code that consumes Entity Framework without actually using Entity Framework". It's counter-intuitive, if one does not understand the difference between unit testing and integration testing. For better or worse, if the desire is to properly unit test, this is a requirement - and Entity Framework should encourage it, not just allow for it.

For example, we regularly get requests to add support in the InMemory provider for this or that SQL Server method (e.g. anything in SqlServerDbFunctionsExtensions).

If the purpose for these requests is to unit test using the InMemory provider, I would suggest Entity Framework resist making (i.e., refuse to make) any change to the InMemory provider for that reason. The Entity Framework team should push back with "If you're asking for this to unit test, you're not properly unit testing." and, once you have written the docs, provide the link to the docs that show how to properly unit test application code using Entity Framework, assuming that the functionality really can be tested.

In my opinion, the InMemory provider shouldn't be used for unit testing, as the database dependency should be isolated, abstracted, and injected for unit testing. Additionally, it shouldn't be used for integration testing, as the InMemory provider isn't a production target. If the purpose of the InMemory provider is only to unit test or integration test, Microsoft can save lots of money by abandoning development of the InMemory provider.

I'm fairly certain that will be an unpopular opinion, especially after the InMemory provider has been a suggested method to test EF for some time.

I'm not even sure how one goes about mocking queries like that with NSubstitute/MockQueryable - the call to these provider-specific methods is embedded somewhere deep in the query tree, and simply cannot be evaluated in .NET. To be successful with approach 1, your code really must be restricted to the bare minimum of query operators/constructs which happen to work in .NET.

Considering my current workload, give me a couple days to give it a try. I don't think I have used any of the SqlServerDbFunctionsExtensions, except maybe Contains(). I'll try to create an example unit test project for testing a few of these static functions and commit them to a public repository.

(Ironically enough, approach 1 is only possible with a LINQ provider such as EF Core; if you write raw SQL, approach 2 is the only thing you can do anyway, since evaluating query operators cilent-side simply isn't an option).

I completely agree. That said, if you're writing raw SQL often, you're not really doing what Entity Framework is intended to do, are you? I would argue that if you're throwing SQL at the database often, you should use ADO.NET or it's ilk instead of using Entity Framework. Again, use the right tool for the job. Just because Entity Framework can query with raw SQL doesn't mean that it should. Don't use an ORM when you should use a low-level database access library.

roji commented 3 years ago

I see value in both approaches - 1 and 2. While I suggested Mocking the query input data, there is also value in Mocking the query outputs/results. The two approaches are not mutually exclusive and both have value. I have used both, although not in the same project.

Well, concretely speaking I'd recommend that any real project have thorough integration testing (in fact, I'd argue these should come first, with unit tests coming later, as integration tests are the only way to make sure the application actually works in production). Given that, proposing that a project also implement two types of unit tests seems a bit much - you end up with three types of testing.

What do you see as the advantage of approach 1 (mocking DbSet/InMemory) over approach 2 (repository)? The latter indeed forces a design that could be considered heavy, but if the goal is to do proper unit testing, with database mocking, then what does approach 1 allow that approach 2 doesn't?

[...] if the Repository or UoW pattern is implemented for the sole purpose of unit or integration testing, it's probably not the best choice because it is not preferable to have testing code in the tested project.

I agree that this is the main problem with using repository for unit testing, and it's probably the reason why so many users turn to mocking DbSet/InMemory. However, I wouldn't say this is introducing testing code in the product (tested project), but rather architecting the product in a way which allows it to be tested successfully. The same principle is at work when many users choose dependency injection - it's an architecture that decouples components in a way which enables testing.

We're generally skeptical of approach 1; "mocking the database" doesn't mean just replacing raw tables with in-memory data, since databases are also responsible for evaluating and executing actual query code (expressed in SQL).

I can understand how the Entity Framework team would be skeptical of approach 1. It's essentially saying "unit test your code that consumes Entity Framework without actually using Entity Framework". It's counter-intuitive, if one does not understand the difference between unit testing and integration testing. For better or worse, if the desire is to properly unit test, this is a requirement - and Entity Framework should encourage it, not just allow for it.

That wasn't quite the point I was trying to make - I do believe that proper unit testing should not involve EF Core (which is how repository-based unit testing works). The point here is simply that it's impossible to unit test queries which involve operators/methods which can't be evaluated client-side, regardless of whether they use EF Core (with InMemory) or not (by mocking DbSet with Moq/NSubstitute).

For example, we regularly get requests to add support in the InMemory provider for this or that SQL Server method (e.g. anything in SqlServerDbFunctionsExtensions).

If the purpose for these requests is to unit test using the InMemory provider, I would suggest Entity Framework resist making (i.e., refuse to make) any change to the InMemory provider for that reason. The Entity Framework team should push back with "If you're asking for this to unit test, you're not properly unit testing." [...]

We do indeed refuse these changes. But the point here, once again, is what it means to unit tests these queries. If you go with approach 2 (repository), you mock the query results (just as if the query were written in SQL instead of LINQ), and everything is fine. Approach 1 simply doesn't allow queries with non-client-evaluatable components - regardless of InMemory or DbSet mocking.

In my opinion, the InMemory provider shouldn't be used for unit testing, as the database dependency should be isolated, abstracted, and injected for unit testing. Additionally, it shouldn't be used for integration testing, as the InMemory provider isn't a production target. If the purpose of the InMemory provider is only to unit test or integration test, Microsoft can save lots of money by abandoning development of the InMemory provider.

Believe me, the EF team isn't a fan of the InMemory provider (to say the least), and we've discussed obsoleting it in the past.

But I'm curious about one point... Obviously InMemory isn't usable for integration testing (and anyone going on this direction is misunderstanding what integration testing means). However, why do you feel InMemory shouldn't be used for unit testing, as opposed to DbSet mocking with e.g. NSubstitute/MockQueryable? Both techniques do very similar things - they replace DbSet with an in-memory collection, and then evaluate LINQ operators cilent-side over it; in that sense, both isolate and abstract away the database, and both produce a mocked DbContext/DbSet to be injected for unit testing (there are definitely some differences, but I'd like to hear your opinion about this first).

Considering my current workload, give me a couple days to give it a try. I don't think I have used any of the SqlServerDbFunctionsExtensions, except maybe Contains(). I'll try to create an example unit test project for testing a few of these static functions and commit them to a public repository.

That would be great, thanks!

(Ironically enough, approach 1 is only possible with a LINQ provider such as EF Core; if you write raw SQL, approach 2 is the only thing you can do anyway, since evaluating query operators cilent-side simply isn't an option).

I completely agree. That said, if you're writing raw SQL often, you're not really doing what Entity Framework is intended to do, are you? I would argue that if you're throwing SQL at the database often, you should use ADO.NET or it's ilk instead of using Entity Framework. Again, use the right tool for the job. Just because Entity Framework can query with raw SQL doesn't mean that it should. Don't use an ORM when you should use a low-level database access library.

My point here was different... As I wrote above, we generally believe that approach 1 is problematic for unit testing EF Core applications, and approach 2 should usually be preferred. The interesting thing is that approach 1 is only possible with EF Core in the first place, since with raw SQL the option isn't even there. So in a way, I feel like I'm telling users "unit test your code as if EF Core weren't there".

kevingy commented 3 years ago

Well, concretely speaking I'd recommend that any real project have thorough integration testing (in fact, I'd argue these should come first, with unit tests coming later, as integration tests are the only way to make sure the application actually works in production).

Integration testing is 100% required after unit testing. Performing integration testing first would contradict principles of test driven development (TDD). In unit testing, with or without TDD, in each individual test, I really only want to test my very small unit of code without dependencies. Unit testing really is "does my code - and ONLY my code - work"? Integration testing comes in later, after "my code and only my code" is proven by passing unit tests.

Given that, proposing that a project also implement two types of unit tests seems a bit much - you end up with three types of testing.

Yep, that's the idea, for better or worse. Unit testing is done primarily for and by devs prior to committing any code, either with or without TDD. (Does my code work by itself?) Typically unit tests are run by an automated system on every commit to source control. Integration testing comes next, bringing in dependencies. (Does my code play well with others?) Because of dependencies, integration tests may not be able to run by a pipeline. Then, manual testing comes last to make sure it all works as expected. (Does my code meet the acceptance criteria of the story?) Each type of testing has a clearly defined purpose. In my experience, if code is properly unit tested and properly integration tested, typically the only time there is a bug in the code is if/when the code doesn't meet the requirements of the story, which is found in manual testing.

All of this being "a bit much" is a common opinion, which is why lots of companies and teams take short cuts like using InMemory for unit or integration tests or skipping automated integration testing entirely, relying on manual testing to verify integration. The more short cuts taken, the more bugs there are in the final product.

As far as EF is concerned, I would think that you would want to, at best, encourage and, at worst, at least allow teams that want to do testing "the purist way" to do so. As it stands, EF makes consumers really work to properly unit test. EF doesn't support the fundamental principles of proper unit testing "out of the box", requiring additional code to wrap EF to allow injection, regardless of what approach is taken. The unit testing suggestions (specifically, InMemory) have been simply wrong, to be blunt. The docs you are working on improving are lacking, at best. I am a fan of EF and have used it on every project that has a database that I have worked on for more than 7 years. Personally, the only area I find to be one-star is "native" support for proper unit testing.

What do you see as the advantage of approach 1 (mocking DbSet/InMemory) over approach 2 (repository)? The latter indeed forces a design that could be considered heavy, but if the goal is to do proper unit testing, with database mocking, then what does approach 1 allow that approach 2 doesn't?

Solely from a testing capability standpoint, they are equivalent in that they both allow for unit testing of code that consumes EF by eliminating the dependency on EF. I can't think of anything that can be done in one approach that can't be done in the other.

That wasn't quite the point I was trying to make - I do believe that proper unit testing should not involve EF Core (which is how repository-based unit testing works). The point here is simply that it's impossible to unit test queries which involve operators/methods which can't be evaluated client-side, regardless of whether they use EF Core (with InMemory) or not (by mocking DbSet with Moq/NSubstitute).

Correct. In general, queries aren't tested in unit testing at all. At the unit testing stage, the interaction with EF should be injected. It doesn't matter what the query is, since the "real" EF does not exist in a unit test. Testing of queries is performed in integration testing, as integration testing allows and requires the EF and the database to be in place.

One could ask, "If EF really isn't there, what is the value of the unit test?" The answer is that the code using can be tested in its entirety making the assumption that EF performed as expected. Rarely does any service or controller method contain only code that calls EF. I need to be able to test my code without EF, making the assumption that EF performed as expected in the test case.

But I'm curious about one point... Obviously InMemory isn't usable for integration testing (and anyone going on this direction is misunderstanding what integration testing means). However, why do you feel InMemory shouldn't be used for unit testing, as opposed to DbSet mocking with e.g. NSubstitute/MockQueryable? Both techniques do very similar things - they replace DbSet with an in-memory collection, and then evaluate LINQ operators cilent-side over it; in that sense, both isolate and abstract away the database, and both produce a mocked DbContext/DbSet to be injected for unit testing (there are definitely some differences, but I'd like to hear your opinion about this first).

Usage of InMemory in unit testing violates one of the five fundamental principles of unit testing. From https://docs.microsoft.com/en-us/dotnet/core/testing/unit-testing-best-practices: (Which, I believe, is verbatim from Roy Osherove's Art of Unit Testing.)

Characteristics of a good unit test ... Isolated. Unit tests are standalone, can be run in isolation, and have no dependencies on any outside factors such as a file system or database. ...

If a unit test uses InMemory, it still has a dependency on "outside factors such as a file system or database". Even if it's an InMemory database designed specifically for testing, it's an "outside factor" by definition.

Doesn't Microsoft read their own docs? ;)

So in a way, I feel like I'm telling users "unit test your code as if EF Core weren't there".

You are - or at least should be. That's the whole point of unit testing! Unit testing should only test the application's code without it's dependencies, including EF. That's why EF is being somehow mocked - to test my code in my application without actually calling anything in EF. That's fundamentally what a unit test is - testing only my minute, sometimes trivial, unit of code. At this stage of the testing process, I don't want to test with EF yet. I'm completely focused on my code only.

roji commented 3 years ago

Integration testing is 100% required after unit testing. Performing integration testing first would contradict principles of test driven development (TDD). In unit testing, with or without TDD, in each individual test, I really only want to test my very small unit of code without dependencies. Unit testing really is "does my code - and ONLY my code - work"? Integration testing comes in later, after "my code and only my code" is proven by passing unit tests.

I'm aware that there's a lot of principles and theory around these questions, but I think it's good to approach these question with lots of context and avoid dogma. In EF Core, we generally tend to prefer covering features via integration testing; many internal components don't have dedicated unit tests, but are well-covered by exhaustive end-to-end tests that guarantee the features they participate in (a good example is the query pipeline); these integration tests run on every commit. We do practice TDD, but the test defining the behavior is frequently an integration test (for the entire feature) rather than a unit test isolating dependencies. In general, if I had to choose, I'd rather have a good, robust integration testing suite for my product - guaranteeing its end-to-end functioning - rather than a series of units for each component it contains.

In the real world, it seems that people frequently turn to mocking not because of any principle/methodology/theory, but rather because setting up reliable and fast integration testing isn't trivial. This is specifically something I plan to address for databases in the upcoming docs.

EF doesn't support the fundamental principles of proper unit testing "out of the box" [...]

I'm interested - what, very concretely, could EF Core do better to support unit testing?

What do you see as the advantage of approach 1 (mocking DbSet/InMemory) over approach 2 (repository)? The latter indeed forces a design that could be considered heavy, but if the goal is to do proper unit testing, with database mocking, then what does approach 1 allow that approach 2 doesn't?

Solely from a testing capability standpoint, they are equivalent in that they both allow for unit testing of code that consumes EF by eliminating the dependency on EF. I can't think of anything that can be done in one approach that can't be done in the other.

I've already addressed this above - the moment anything is done in the query that cannot be evaluated client-side, approach 1 breaks down. This includes raw SQL, methods in the query tree which aren't evaluatable (e.g anything in SqlServerDbFunctionsExtensions), etc. This alone seems quite a good reason to do approach 2 over 1.

Given that, proposing that a project also implement two types of unit tests seems a bit much - you end up with three types of testing.

Yep, that's the idea, for better or worse. Unit testing is done primarily for and by devs prior to committing any code, either with or without TDD. (Does my code work by itself?) Typically unit tests are run by an automated system on every commit to source control. Integration testing comes next, bringing in dependencies. [...]

My question wasn't really about unit vs. integration testing.. To be very specific, what do you see are the advantages of query input mocking (approach 1 above), over query output mocking (approach 2 above)? The first doesn't require you to adopt a repository architecture (that's an advantage), but if that's not a concern, why would someone do approach 1 over approach2 ?

Usage of InMemory in unit testing violates one of the five fundamental principles of unit testing. From https://docs.microsoft.com/en-us/dotnet/core/testing/unit-testing-best-practices: (Which, I believe, is verbatim from Roy Osherove's Art of Unit Testing.)

Characteristics of a good unit test ... Isolated. Unit tests are standalone, can be run in isolation, and have no dependencies on any outside factors such as a file system or database. ...

If a unit test uses InMemory, it still has a dependency on "outside factors such as a file system or database". Even if it's an InMemory database designed specifically for testing, it's an "outside factor" by definition.

I disagree with your interpretation here. What is the outside factor you're seeing here with InMemory? The provider itself? If so, is NSubstitute also an outside factor? Because there's definitely no external filesystem or database at play when doing InMemory. From another angle: in your code sample above, you are running queries against an in-memory List<T> - is that an external dependency as well? Because that's quite similar conceptually to what the InMemory provider does.

To summarize, I view both the InMemory provider and the NSubstitute-based approach above as two techniques for mocking query inputs (as opposed to outputs, which is what an IEnumerable-based repository achieves). While there are certainly differences between the two, conceptually they are very similar: run LINQ operators in memory against a memory data structure (e.g. List<T>). Both these techniques mock query inputs (approach 1), and I think there's good reason to prefer mocking query outputs instead, via a repository (approach 2).

Doesn't Microsoft read their own docs? ;)

I definitely haven't read every piece of documentation published by Microsoft, and even for those I have, I also don't necessary share your specific interpretation of them.

roji commented 3 years ago

I'd like to add one piece of info... If we look at the unit testing docs link you posted above (https://docs.microsoft.com/en-us/dotnet/core/testing/unit-testing-best-practices), the major point made in favor of unit testing and against integration testing is running speed, e.g. the ability to run all unit tests on each commit. Another point is isolation, i.e. to avoid various state leaks and conflicts between integration tests running in parallel against the same database.

EF Core's functional (read: integration) test suite for SQL Server alone contains over 30000 integration tests which run against the database. These run in parallel, and are executed for each and every commit in our CI pipeline. These execute in a few minutes, give us absolute confidence that the product as a whole isn't broken (a thing unit tests do not provide), and do not break as we do various internal refactorings (since they only rely on EF public surface area - again unlike unit tests). This isn't trivial to do, but it is definitely possible to architect your integration test suite to achieve it (I plan to work on docs for this). We also have unit tests for aspects which are difficult to test in integration tests (e.g. failure scenarios), or for components which merit specific testing for some reason. Now, given the option of such a full-fledged integration test suite, do you still believe unit tests should come first?

And one final angle on this important subject... The general advice to mock external dependencies (as in the docs you linked to) is correct in many cases - I am not trying to make a universal claim here. In many cases, the external dependency is some web service which simply cannot be used all the time, or is unreliable or slow. When the external dependency is a database, we're actually quite fortunate - it's generally easy to run your production database system locally (e.g. SQL Server LocalDB on Windows), and simply test against it - once you ensure proper isolation between your tests.

To summarize, I don't believe it's productive to debate unit tests vs. integration tests in general, without taking into account the very specific external dependency in question; it's all a matter of context.

kevingy commented 3 years ago

Well, it looks like you have it all figured out. Good luck!

roji commented 3 years ago

@kevingy the idea in this conversation wasn't to prove myself right or anything - the discussion with you definitely helped me realize some things and advance my understanding of what testing EF Core application means. I'd be more than happy to be contradicted on any of the points above (with concrete arguments/examples) - that's how it all moves forward.