doctrine / data-fixtures

Doctrine2 ORM Data Fixtures Extensions
http://www.doctrine-project.org
MIT License
2.76k stars 224 forks source link

Implement reference by class names #409

Closed VincentLanglet closed 1 year ago

VincentLanglet commented 1 year ago

Hi @greg0ire

Closes https://github.com/doctrine/data-fixtures/issues/408

I introduce a $class param in a BC way:

In next major, the idea would be to make the class param mandatory. This way, having two fixtures with the same name but different class would be allowed.

greg0ire commented 1 year ago

You should retarget to 1.6.x

VincentLanglet commented 1 year ago

You should retarget to 1.6.x

It should be better now @greg0ire

greg0ire commented 1 year ago

This should help with some of the failures: https://github.com/doctrine/data-fixtures/pull/412 (and means you need to rebase)

VincentLanglet commented 1 year ago

Rebased done

greg0ire commented 1 year ago

vendor/bin/phpcbf should help

VincentLanglet commented 1 year ago

vendor/bin/phpcbf

Done, also updated the baseline

greg0ire commented 1 year ago

There is an example call of getReference in the docs, you need to update it to show the correct usage.

VincentLanglet commented 1 year ago

There is an example call of getReference in the docs, you need to update it to show the correct usage.

Nice catch. I updated the doc.

I didn't added use App/.../Role since there was none for User.

VincentLanglet commented 1 year ago

Friendly ping @stof If you have time to take a new look :)

VincentLanglet commented 1 year ago

I get no news from stof, Do you think it can be merged @greg0ire ?

greg0ire commented 1 year ago

Sure! Thanks @VincentLanglet !

VincentLanglet commented 1 year ago

You're welcome. Do you mind releasing the 1.6.0 version, or another PR/issue is needed first ? @greg0ire

Also should I provide a PR on 2.x branch to solve the deprecation (into BC-break) I introduced ?

greg0ire commented 1 year ago

Please do provide a 2.x PR. I'm AFK, but will try releasing from my phone.

janklan commented 1 year ago

I'd like to report a shortfall in a scenario where class inheritance is at play. This is a problem when using Doctrine's inheritance mapping.

Consider the example inheritance mentioned on Doctrine's page:

class Person {}
class Employee extends Person {}
class Boss extends Person {}

When you generate a lot of Employees and Boss entities, you're later unable to fetch those entities without also knowing which exact class are you looking for. In other words, you can't do (pseudo-code!):

$this->setReference('person1', new Employee());
$this->getReference('person1', Person::class);

This is because the fixtures are stored in an array indexed by the object's direct class name in ReferenceRepository::$referencesByClass. I think this approach is wrong.

I created my own solution to achieve the intended objective some time ago. A snippet of an abstract class that other Fixture-generating classes extend is below:

<?php

use Doctrine\Bundle\FixturesBundle\Fixture;

abstract class AbstractFixture extends Fixture
{
  /**
     * @template T of object
     * @param string $handle
     * @param class-string<T> $expectedType
     * @return T
     */
    protected function getTypedReference(string $handle, string $expectedType): mixed
    {
        $object = $this->getReference($handle, $expectedType);
        if (!$object instanceof $expectedType) {
            throw new \InvalidArgumentException(sprintf('Handle %s is %s, not %s', $handle, get_debug_type($object), $expectedType));
        }

        return $object;
    }
}

Unless I'm missing something, using if (!$object instanceof $expectedType) { yields a comparable outcome with greater flexibility compared to the method proposed in this PR. It doesn't allow duplicate key usage, but I don't see that as a problem comparing to the negative outcome described above.

I would like to suggest the proposed change might need to be reconsidered.

VincentLanglet commented 1 year ago

When you generate a lot of Employees and Boss entities, you're later unable to fetch those entities without also knowing which exact class are you looking for. In other words, you can't do (pseudo-code!):

$this->setReference('person1', new Employee());
$this->getReference('person1', Person::class);

Unless I'm missing something, using if (!$object instanceof $expectedType) { yields a comparable outcome with greater flexibility compared to the method proposed in this PR. It doesn't allow duplicate key usage, but I don't see that as a problem comparing to the negative outcome described above.

Duplicate key usage is one useful feature of this change. I see no reason to lost this when you can easily do

$this->getReference('person1', Employee::class);

Which seems to be the right way to fetch the fixture since Employee and Boss won't have the same API.

But maybe some others solutions exists which allow both to cohabit, like allowing to call

$this->setReference('person1', new Employee(), Person::class);
janklan commented 1 year ago

Duplicate key usage is one useful feature of this change. I see no reason to lost this when you can easily do

That's a bold assumption probably specific to your particular use case

I see it exactly the other way:

  1. Make your keys unique so you always know what are you getting exactly what you set, without having to create a composite primary key made from the key and its class name
  2. Don't rely on a class name to fish out the correct entity - this problem only exists because duplicate keys are allowed, and introduces an artificial constraint I'm talking about.

Which seems to be the right way to fetch the fixture since Employee and Boss won't have the same API.

Not true. While there would be differences, part of the API can overlap. That's the whole point of class inheritance. An example would be a comment system that accepts a Person class to record who authored a Comment entity. The comment can be authored by both Boss and Employee or any other child of the Person entity. The overlap in the API could be, for example, a Person::getName(). Every person has a name and the comment system wants to show it next to the comment.

Your fixtures may hold 100 references to random objects of Person child classes, indexed by person1..100. You may randomly select one of those objects to generate a ton of Comment fixtures. When doing $this->getReference('person'.random_int(1, 100)), you can't know what class name to ask for, and frankly, at that time you don't care, because the Comment entity doesn't care.

The above is a simplified illustrative scenario explaining my point, but I have a comparable real-world use case. That's how I discovered the behaviour in the first place. I didn't make it up to be difficult in a random PR.

Come to think of it, I could think of a use case where you are asking for an interface, not even a specific class. The instanceof check just works way better than just recording the ultimate class name and calling it a name.

But maybe some others solutions exist which allow both to cohabit.

Yes: I'd suggest we could walk back the decision of making the $class attribute mandatory. If not provided, no class checks take place and things go back to the way they were.

VincentLanglet commented 1 year ago

Yes: I'd suggest we could walk back the decision of making the $class attribute mandatory. If not provided, no class checks take place and things go back to the way they were.

You're still not developing how to keep the duplicate key feature.

One way could be to change the storage from

$data[$class][$name]

to

$data[$name][$class]

If you call getReference('foo', Foo::class) it still works, and if you call getReference('foo'), it could have the following behavior:

WDYT ?

janklan commented 1 year ago

You're still not developing how to keep the duplicate key feature.

That's because I truly believe it's not a good idea to allow duplicate keys.

and if you call getReference('foo'), it could have the following behavior:

  • If there is only one fixture with the foo name, it returns this one
  • If there is multiple fixtures with the foo name, it throws an exception because the class is required

WDYT ?

I'd say that could work, depending on the implementation.

My thoughts are:

  1. I'd say public function setReferenceIdentity($name, $identity, ?string $class = null) should adopt the outcome of $identity::class. Failing to do so, the developer could use any arbitrary string as the $class value and the repository would accept it. Without any further type checking, the combination of $name and $class truly forms a composite primary key without adding any additional type safety checks - the real reason why we ask for a class name in the first place, isn't that right? That's why I did it anyway: to be able to fetch stuff from the repository in a way allowing PHPStorm, phpstan and others to correctly determine the returned object's type, while also actually asserting it in the getter's code.
  2. If $class was set when setting the reference, setReferenceIdentity should ensure the correctness of the user's input by testing either $identity instanceof $class or is_a($identity, $class, true) - this is to mitigate the problem described in the point above. I also think the manual $class attribute should be only used as a signal for the setter to check the type. I believe the $identity::class should be used as the key, as described above.
  3. If only one element exists under $data[$name][$class], the $class should be ignored for the purposes of looking up the element in $data[$name] and the value should be retrieved using $data[$name][array_key_first($data[$name])] or similar.
  4. If only one element exists $data[$name][$class], getReference('foo', Bar::class) should again assert the $data['foo'] instanceof Bar::class. Also, providing Bar::class allows for using the psalm templates to indicate what reference is being returned. That way the return type of the object is guaranteed and known to the IDE.
  5. If the class name is not provided and there is only one element in $data[$name][$class], getReference() wouldn't do any type checking.

This would allow one of the two scenarios:

1) fetching the only element in $data[$name] using getReference('person1', Person::class) even if the 'person1' was an Employee 2) allow using duplicate keys as long as the $identity::class would be different.

Please note none of the variants discussed allow using duplicate keys if the class doesn't differ:

$john = new Person();
$doe = new Person();
$registry->setReference('person', $john); // sets john
$registry->setReference('person', $doe); // overwrites john

I repeat, I believe allowing duplicate keys lacks merit.

VincentLanglet commented 1 year ago

You're still not developing how to keep the duplicate key feature.

That's because I truly believe it's not a good idea to allow duplicate keys.

That's your point of view and I disagree with it. I can say the same, it's not a good idea to allow fetching a reference without knowing the class you're fetching. When you're doing a database request, I wonder if you're writing

$this->entityManager->find(42);

or

$this->entityManager->find(Foo:class, 42);
$this->entityManager->getRepository(Foo::class)->find(42);

While I proposed solutions to allow both, you're just looking for solving your debatable use-case and remove another feature. This mindset doesn't encourage me to keep the discussion.

I'd say that could work, depending on the implementation.

Feel free to provide one.

  1. I'd say public function setReferenceIdentity($name, $identity, ?string $class = null) should adopt the outcome of $identity::class. Failing to do so, the developer could use any arbitrary string as the $class value and the repository would accept it. Without any further type checking, the combination of $name and $class truly forms a composite primary key without adding any additional type safety checks - the real reason why we ask for a class name in the first place, isn't that right? That's why I did it anyway: to be able to fetch stuff from the repository in a way allowing PHPStorm, phpstan and others to correctly determine the returned object's type, while also actually asserting it in the getter's code.
  2. If $class was set when setting the reference, setReferenceIdentity should ensure the correctness of the user's input by testing either $identity instanceof $class or is_a($identity, $class, true) - this is to mitigate the problem described in the point above. I also think the manual $class attribute should be only used as a signal for the setter to check the type. I believe the $identity::class should be used as the key, as described above.

Not sure to understand.

  1. If only one element exists under $data[$name][$class], the $class should be ignored for the purposes of looking up the element in $data[$name] and the value should be retrieved using $data[$name][array_key_first($data[$name])] or similar.

Yes, that's I proposed when I talked about getReference('foo').

  1. If only one element exists $data[$name][$class], getReference('foo', Bar::class) should again assert the $data['foo'] instanceof Bar::class. Also, providing Bar::class allows for using the psalm templates to indicate what reference is being returned. That way the return type of the object is guaranteed and known to the IDE.

There is no need for assertion if you're accessing

$data['foo'][Bar::class];

with the call getReference('foo', Bar::class).

  1. If the class name is not provided and there is only one element in $data[$name][$class], getReference() wouldn't do any type checking.

This would allow one of the two scenarios:

  1. fetching the only element in $data[$name] using getReference('person1', Person::class) even if the 'person1' was an Employee

This would be the weirdest behavior to have:

$john = new Employee();
$registry->setReference('john', $john)
$registry->getReference('john') // work
$registry->getReference('john', Person::class) // work
$registry->getReference('john', Employee::class) // work
$registry->getReference('john', Dog::class) // work since (5) you don't do type checking, this seems wrong
$john = new Dog();
$registry->setReference('john', $john)
$registry->getReference('john') // doesn't work
$registry->getReference('john', Person::class) // doesn't work either, this one doesn't seems natural since it worked earlier
$registry->getReference('john', Employee::class) // work
$registry->getReference('john', Dog::class) // work

Please note none of the variants discussed allow using duplicate keys if the class doesn't differ:

$john = new Person();
$doe = new Person();
$registry->setReference('person', $john); // sets john
$registry->setReference('person', $doe); // overwrites john

I repeat, I believe allowing duplicate keys lacks merit.

This behavior is already the existing one if you look at the method https://github.com/doctrine/data-fixtures/blob/dadd3c5a49ce762d17bec1f16276de3b53f4ab62/src/ReferenceRepository.php#L104-L106

And if you're using the addReference method https://github.com/doctrine/data-fixtures/blob/dadd3c5a49ce762d17bec1f16276de3b53f4ab62/src/ReferenceRepository.php#L178 You'll get

$john = new Person();
$doe = new Person();
$dog = new Dog();
$registry->addReference('foo', $john); // works
$registry->addReference('foo', $doe); // fail in 1.x and will fail in 2.x
$registry->addReference('foo', $dog); // fail in 1.x and won't fail in 2.x
janklan commented 1 year ago

That's your point of view and I disagree with it. I can say the same, it's not a good idea to allow fetching a reference without knowing the class you're fetching.

Scalar primary key made of unique string value seems a simpler identifier than a composite key made of a non-unique string value and a non-unique class name, but hey, we both probably have good reasons for our views. Let's agree to disagree, not a big deal!

When you're doing a database request, I wonder if you're writing

$this->entityManager->find(42);

or

$this->entityManager->find(Foo:class, 42);
$this->entityManager->getRepository(Foo::class)->find(42);

While the first option clearly isn't really an option, the other depends on the use case. Sometimes I know what instance I'm trying to get, sometimes I don't. I regularly let Symfony & Doctrine work out what object should be constructed when I use Doctrine's entity inheritance. $entityManagerInterface->getRepository(Person::class)->find(1) will return whichever child of Person matches ID 1, be it Employee, Boss your Dog, or anything else.

The more frequent use case is when converting route parameters:

#[Route(path: '/{person}')]
public function read(Person $person): Response
{
  // $person can again be Employee, Boss or whatever else, I honestly don't care in some cases.
}

While I proposed solutions to allow both, you're just looking for solving your debatable use-case and remove another feature.

My use case is a documented fairly standard use of Doctrine's entity inheritance.

I'd say that could work, depending on the implementation.

Feel free to provide one.

  1. I'd say public function setReferenceIdentity($name, $identity, ?string $class = null) should adopt the outcome of $identity::class ... [quote shortened]

Not sure to understand.

If I'm reading the code correctly, setReference('foo', $person, 'randomstring') will store $person under $data['person']['randomstring'], where the randomstring has got nothing to do with the actual class name of $person?

  1. If only one element exists under $data[$name][$class], the $class should be ignored for the purposes of looking up the element in $data[$name] and the value should be retrieved using $data[$name][array_key_first($data[$name])] or similar.

Yes, that's I proposed when I talked about getReference('foo').

Great. From your wording was not clear whether the $class value mattered or not. My point is to make sure that getReference('foo', Person::class) would yield the only object even if it was stored under $data['foo'][Employee::class]

  1. If only one element exists $data[$name][$class], getReference('foo', Bar::class) should again assert the $data['foo'] instanceof Bar::class. Also, providing Bar::class allows for using the psalm templates to indicate what reference is being returned. That way the return type of the object is guaranteed and known to the IDE.

There is no need for assertion if you're accessing $data['foo'][Bar::class]; with the call getReference('foo', Bar::class).

See my point about setReference('foo', $person, 'randomstring'). Unless you assert the type somewhere, it can't be guaranteed. Asserting it in setReference makes more sense, but the code isn't currently doing it.

This behavior is already the existing one if you look at the method

I missed that, apologies.

VincentLanglet commented 1 year ago

Not sure to understand.

If I'm reading the code correctly, setReference('foo', $person, 'randomstring') will store $person under $data['person']['randomstring'], where the randomstring has got nothing to do with the actual class name of $person?

Allowing to pass a third parameter to the setReference was one of my first solution, but then I proposed another one which doesn't need this change. The fact we don't need a third parameter for setReference would solve all the issues related to the fact we could pass anything.

  1. If only one element exists under $data[$name][$class], the $class should be ignored for the purposes of looking up the element in $data[$name] and the value should be retrieved using $data[$name][array_key_first($data[$name])] or similar.

Yes, that's I proposed when I talked about getReference('foo').

Great. From your wording was not clear whether the $class value mattered or not. My point is to make sure that getReference('foo', Person::class) would yield the only object even if it was stored under $data['foo'][Employee::class]

That's a debatable point.

The first thing I have in mind was to allow call like getReference('foo') but getReference('foo', Person::class) is more tricky. See the following example:

setReference('foo', new Employe()); // Works
getReference('foo'); // Works
getReference('foo', Employe::class); // Works
getReference('foo', Person::class); // Doesn't work with my implementation but you want this to work

setReference('foo', new Dog()); // Works
getReference('foo'); // Wont work
getReference('foo', Employe::class); // Works
getReference('foo', Person::class); // Is it supposed to work ?

setReference('foo', new Boss()); // Works
getReference('foo'); // Wont work
getReference('foo', Employe::class); // Works
getReference('foo', Person::class); // Won't work

setReference('foo', new Person()); // Works
getReference('foo'); // Wont work
getReference('foo', Employe::class); // Works
getReference('foo', Person::class); // Works with my implementation but is it supposed to works with yours ?

Maybe the solution would be to play with is_a method:

return $values[0];

janklan commented 1 year ago
$values = [];
foreach ($data[$name] as $value) {
     if (is_a($value, $class)) {
          $values[] = $value;
     }
}

I think iterating the whole array all the time could be slow with a large number of fixtures.

Overnight I realised another downside: if you rely on a class name, you make the implicit assumption a fixture is an object. I can't say I do it myself, but the current system lets you store anything, including scalars. I think we should not throw that away.

I came up with a slightly different approach: storing fixtures in a single array twice: under $key to maintain the current behaviour where things can get overwritten unless using unique keys, and then also under a "namespaced" $key.get_debug_tyope($value), where you can use duplicate $key value as long as the type of $value differs between identical keys.

Have a look at a working demo, https://3v4l.org/HuZD6#v8.2.2 - I added inline comments to explain what I thought needs it. The end of the code, labeled as Part 3, shows some interesting improvements that I don't think would work with what this PR proposes.

In the example, if you use duplicate $key on multiple $value objects of the same type, the key is overwritten. We could easily add more checks to prevent it. I just wanted to keep the PoC simple.

What do you think?

We could also take this chat to a more real-time medium. We're clearly both interested in finding a good solution.

VincentLanglet commented 1 year ago
$values = [];
foreach ($data[$name] as $value) {
     if (is_a($value, $class)) {
          $values[] = $value;
     }
}

I think iterating the whole array all the time could be slow with a large number of fixtures.

I was worried about this too, but since we're only iterating on the fixture with the name $name, I thought this could be ok.

Overnight I realised another downside: if you rely on a class name, you make the implicit assumption a fixture is an object. I can't say I do it myself, but the current system lets you store anything, including scalars. I think we should not throw that away.

The fixture is always an object according to the phpdoc https://github.com/doctrine/data-fixtures/blob/1.6.x/src/ReferenceRepository.php#L109 no ?

janklan commented 1 year ago

The fixture is always an object according to the phpdoc https://github.com/doctrine/data-fixtures/blob/1.6.x/src/ReferenceRepository.php#L109 no ?

Hmm, OK, it looks like that's the case. So scrap that "benefit" of being able to store scalars, it's probably a use case nobody needs. It just crossed my mind before I fell asleep.

The rest of the PoC remains valid, though.

Pixelshaped commented 1 year ago

We've got another issue with this PR.

When we're calling $this->addReference($referenceName, $entity); it now calls $class = $this->getRealClass(get_class($object)); which is problematic for us.

Indeed, we have two EntityManagers configured. And it seems there is no way to indicate which one to create the reference on.

Beforehand, it did not refer to the ObjectManager hence we could add references from different EntityManagers with no issue (maybe that was undesired but now we're missing a method to add a reference to a specific EntityManager).

Our load method:

public function load(ObjectManager $manager): void
{
        $this->entityManager = $this->managerRegistry->getManager($this->getTargetEntityManagerName());

        $this->loadData(); // Creates entities and persists them and calls addReference() 

        $this->entityManager->flush();
}

getTargetEntityManagerName() is a custom method that our fixtures can override and it returns default by default and another EntityManager name when needed.

VincentLanglet commented 1 year ago

I don't fully understand the issue. This PR is 5 months old, that's not the right place to discuss about issues. Please open an issue instead to discuss about it (and ping me @Pixelshaped)

derrabus commented 7 months ago

@VincentLanglet The 2.0.x branch is now open. Do you want to continue your work there?

VincentLanglet commented 7 months ago

Sure Ill take a look

VincentLanglet commented 7 months ago

@VincentLanglet The 2.0.x branch is now open. Do you want to continue your work there?

I made https://github.com/doctrine/data-fixtures/pull/464