FR: Have EntityManagerInterface::getRepository() return interface rather than implementation

doctrine / orm

Doctrine Object Relational Mapper (ORM)

https://www.doctrine-project.org/projects/orm.html

MIT License

9.94k stars 2.52k forks source link

FR: Have EntityManagerInterface::getRepository() return interface rather than implementation #11019

Open Firehed opened 1 year ago

Firehed commented 1 year ago

Feature Request

Q	A
New Feature	yes
RFC	no
BC Break	no[^1]

Summary

Adjust EntityManagerInterface::getRepository() to return an interface instead of a concrete EntityRepository object. Ensure that EntityRepository implements that same interface. The actual EntityManager remains unchanged.

Why?

ORM 3 is adding native return types to the class methods, instead of docblock annotations. This is great! However, it does make some unit testing more challenging, especially when using mocks or stubs. I ran into difficulties with my existing test suite due to this change. While not insurmountable, having test tools implement an interface rather than being forced to extend a concrete class would simplify updating to the upcoming release.

How?

EntityRepository already implements ObjectRepository and Selectable, which cover most of its public methods already.

Diffing things, I found four additional public methods not in those interfaces:

createQueryBuilder
createResultSetMappingBuilder
count (NOT compatible with \Countable)
__call

So this change could look something like:

interface EntityRepositoryInterface extends ObjectRepository, Selectable
{
  public function createQueryBuilder(...);
  public function createResultSetMappingBuilder(...);
  public function count(...);
  public function __call(...);
}

-class EntityRepository implements ObjectRepository, Selectable
+class EntityRepository implements EntityRepositoryInterface

interface EntityManagerInterface
{
-    public function getRepository(string $className): EntityRepository;
+    public function getRepository(string $className): EntityRepositoryInterface;
}

Requiring __call in a common interface feels a little funky to me, but not strictly problematic. There would be a little additional work on the docblock generics but at a glance it should mostly pass through.

If this is a change you're open to, I'm happy to do the implementation.

Note: I think a similar change could be beneficial for several other EntityManagerInterface methods, though I suspect they'd be less impactful. They could also be done iteratively if desired.

[^1]: From a pure LSP perspective, this should not be a BC break as described. I'm not sure if there are internals or tooling where this could be in practice.

greg0ire commented 1 year ago

Hey there :wave:

having test tools implement an interface rather than being forced to extend a concrete class would simplify updating to the upcoming release

Is this an issue you are facing with https://github.com/Firehed/mocktrine maybe? And exclusively with it? Or are you facing similar issues when using PHPUnit mocks, for instance?

Firehed commented 1 year ago

Hi, great question :)

Since 3.0.0-beta1 is so new, that's the only place I've tested so far. I'll have to get back to you in terms of other impact as I start to test the beta version upgrade process on actual shipping applications.

greg0ire commented 1 year ago

Ok. On our end, we are quite cautious with this kind of change, because the implication of this is that the EntityRepository could get decorated, and then there might be lots of places where users would expect doctrine to return / pass the decorate repository rather than the original one. That's what happened with EntityManagerInterface anyway.

Firehed commented 1 year ago

Makes sense - I'd figured there was a good reason this hadn't been done already. I would like to continue the discussion though, if you're amenable.

I'm curious if this is a theoretical concern, or something that's come up in practice? Instinctively it feels like the same class of problem of working with custom repositories, which could perform all sorts of behavioral modification. Digging a bit deeper into the source code, it seems that declaring an entity with a custom repo (via the default factory) and Config::setRepositoryFactory() are the only real places I'm aware of where a repository can be modified in any appreciable way (whether through decoration or some other technique). Since none of its methods are marked final, you could really break things if sufficiently determined to do so.

The comments on #9533 (and #9531) go into a bit more detail. It seems to me that the currently-non-interface methods (especially createQueryBuilder) were the primary reason that 9533 was selected over 9531; I believe my suggestion addresses this. Given the size and scope of Doctrine ORM I'm sure there are cases I'm not aware of that necessitate additional testing, but nothing comes to mind offhand (contrated to the previous behavior where e.g. Selectable wasn't part of the return type which caused me frequent static analysis issues)

greg0ire commented 1 year ago

Yeah sure, let's discuss! You may be correct about the issue being different here, since we allow custom repositories, that means we already handle returning something else than strictly EntityRepository. @beberlei @derrabus can you chime in on this?

@Firehed , can you please detail your mocktrine issues as well? Also, what does mocktrine bring over new EntityManager(DriverManager::getConnection(['url' => 'sqlite:///:memory:']))? Is it just about avoiding the schema creation, or is there more to it?

Firehed commented 11 months ago

Hi, sorry about the delayed response!

As far as specific issues that I've encountered, the EntityManagerInterface::getRepository() change is the most prominent unexpected issue. While trying to prototype an upgrade path by extending EntityRepository, I've also found that the matching() return type is probably unnecessarily specific: it's typing AbstractLazyCollection&Selectable when Collection&Selectable is likely sufficient (or, rather, would be on the interface I'm proposing - I don't see any calls to the couple non-interface methods that looks relevant).

A lot of the library's goals indeed stem from schema creation, but it does go beyond that:

Schema creation is frequently not portable across RDBMS systems. Tools that on paper support abstracted schema creation (including doctrine/migrations) often encourage, directly or otherwise, inlining raw SQL instead of using the abstraction (example). This makes using MySQL/Postgres in real environments and sqlite in tests unreliable at best.
Performing schema setup code once per test can massively slow down a large test suite, even with sqlite memory. Often multiple seconds per test, which adds up extremely fast on large projects. The library has optimized setup code using the same paths that Doctrine itself does to avoid these delays. As a concrete example, a codebase using the library runs over 3000 unit tests in about 20 seconds. Even 50ms of DB setup would more than 5x that runtime. A more reasonable 5s setup (which would still be fast) would add over four hours to the test suite.
Conversely, doing schema setup once during boostrapping often leads to really weird interactions across tests, commonly leading to setup/teardown code to wipe or seed the DB on each test - not only slowing things down, but often introducing flakiness to tests
Actual runtime performance is also significantly faster than even in-memory sqlite.
There's no need to have any specific PDO driver or PHP extension in your CI environment
Debugging is sometimes made easier since it can be done purely in PHP rather than needing to check the external state in an actual DB (which AFAIK isn't much of an option with sqlite memory mode)
There are hooks available that make certain assertions easier, such as whether flush() is being called at an appropriate time or whether errors it can produce are being handled

Overall, the primary goals are around test performance and reliability - both are heavy contributors to developer experience. Slow tests get run less, and flaky tests often lead to them getting disabled to get CI to pass (or, worse, ignoring CI failures and shipping anyway). As a secondary improvement, faster tests save money directly since CI services usually bill based on build minutes - not to mention developer time, feedback loops, and all that.

I will readily state that it's not a universal replacement for an actual database connection! When you need true integration tests, a real DB (preferably one configured the same as your actual production environment) is absolutely more suitable. But for unit testing that you don't want external service dependencies, mocks tend to be preferable.

derrabus commented 11 months ago

I believe, mocking repositories is fine. I do this all the time and creating mocks of the EntityRepository class works well for me. After all, the class is meant to be extended, so why shouldn't I be able to mock it.

Can you elaborate why not having an interface makes mocking repositories more difficult?

Firehed commented 11 months ago

For any single release (as of now), it's usually not substantially different or more problematic. The difficulty tends to arise when maintaining mocks across multiple versions, even with a single major version that on paper has no backwards-compatibility breaks.

For example: some methods could be declared final in a point release (typically not considered a BC break, though it depends on diligence), which could cause problems. Similarly, there could be new non-private methods added which break the mock in various ways (such a change to an interface would be considered an explicit BC break, and is therefore unlikely to occur in a minor or patch version), such as LSP checks.

I'd also suggest that "the class is meant to be extended" is true primarily for the use-case of adding additional data loading methods, rather than modifying the behavior of any internals. Indeed, #9533 and the motivation behind it would suggest that the internals methods are at least somewhat likely to eventually be made final making a breakage quite likely. This might be fine for PHPUnit's built-in mocking tools (or not, depending on how you're using them), but lead to undesired behavior for anything a bit more low-level.

So broadly, I'd say having interfaces to mock doesn't make things easier, but more stable.

derrabus commented 11 months ago

For example: some methods could be declared final in a point release (typically not considered a BC break, though it depends on diligence), which could cause problems.

Declaring methods final that haven't been final previously totally is a BC break. Especially a class that is meant to be extended by userland code.

Similarly, there could be new non-private methods added which break the mock in various ways (such a change to an interface would be considered an explicit BC break, and is therefore unlikely to occur in a minor or patch version), such as LSP checks.

We are aware and it's not like we're adding new methods to the base repository class all the time. But if we do, having a class rather than an interface lets us move a bit faster here.

I'd also suggest that "the class is meant to be extended" is true primarily for the use-case of adding additional data loading methods, rather than modifying the behavior of any internals.

I'm not really sure I understand what "internals" you're referring to.