fuel / orm

Fuel PHP Framework - Fuel v1.x ORM
http://fuelphp.com/docs/packages/orm/intro.html
152 stars 96 forks source link

from_cache() does not correctly populate relations #407

Closed Josh-G closed 3 years ago

Josh-G commented 7 years ago

Hi,

When building an ORM query using from_cache(false), it will not correctly fill has_many relations. For example:

$item = \Model_A::query()
            ->where('id', 13)
            ->related(['B', 'B.C', 'B.C.D'])
            ->from_cache(false)
            ->get();

Where A belongs to B, B has many C and C has many D. In this example, B will be returned correctly, but C will only contain a single model, even if multiple are related to B. If I remove the from_cache call, it will correctly return all entries of C that are related to B.

It generates the right query, as running the query directly returns all the correct entries, but it does not seem to hydrate correctly. Please let me know if I'm making a mistake here.

WanWizard commented 7 years ago

from_cache() should not have an effect if nothing is present in cache. If there is something in the cache, from_cache(false) will hydate instead of return the value from cache, so effectively return the same as when the cache is empty.

You can check what happens if you isolate this query from your application, and run it 3 times: first and second time without the from_cache() (to simulate a cold and a hot cache), and the third time with from_cache(false) to ignore the hot cache.

Also, make sure you test with 1.9/develop, so you have the latest ORM updates.

Josh-G commented 7 years ago

Thanks. I've spent a while looking at this problem today and have isolated it from our application and tried a combination of the different cache states you mention. I've also tried 1.9/develop and 1.8.*. I was testing out the new \Orm\Query::caching method (thanks!) when I came across this. The problem does not occur during a cold cache as hydrate() is called per row, the objects are cached during the hydration process: https://github.com/fuel/orm/blob/1.9/develop/classes/query.php#L1431 https://github.com/fuel/orm/blob/1.9/develop/classes/model.php#L897

However, with from_cache(false), nothing is saved to the cache, which causes objects to be discarded, as the return value is not used from hydrate() here (I think): https://github.com/fuel/orm/blob/1.9/develop/classes/query.php#L1514

The object and the current stage of relation hydration is lost as it is not stored in the cache for the next hydrate() call on the next row. Model_A would still be available in the row, but the relations in the previous row(s) are then lost. As the soft and hot cache would have added the relations to the now cached object, the next hydrate call would retrieve these relations and add on to them.

I've spent a long time today reading through the ORM logic, and I'm still a bit flaky, so please please bear with me if I've misunderstood anything.

WanWizard commented 7 years ago

The general problem with ORM caching, is that you have to code your queries carefully.

Say you query model A first on PK, without relations. That record is hydrated and added to the cache. Then query model A again, but now with a WHERE clause, and some relations. Say the resultset includes the object returned in the first query.

During the hydration process, the ORM will detect (using the combination of model name and PK value as index) that the object exists in cache, and will return it from cache, as it is stored.

In this example you will get an array with X records back with relations, and one record without relations, because that record came from the cache and was cached without.

You had the same issue with partial selects, which I fixed recently.

I think using partly cached and partly hydrated results is very complex, I'll have to dive into that when I have a bit more time. That part of the ORM is where the most execution time is, so any change here might have a huge performance hit if done incorrectly.

Josh-G commented 7 years ago

The general problem with ORM caching, is that you have to code your queries carefully.

The problem is that any query like the example will run into this problem when from_cache(false) is used, regardless of the state of the current cache. The hydration method would not correctly fill any non singular relation that spans multiple rows.

I think using partly cached and partly hydrated results is very complex, I'll have to dive into that when I have a bit more time. That part of the ORM is where the most execution time is, so any change here might have a huge performance hit if done incorrectly.

In this situation, it does not matter if there are partially cached objects as we are wishing to entirely skip the cache during this get() call. The partially hydrated object being discarded is the issue.

I've submitted a pull request #408 to ensure the hydrate() method does not discard the partially hydrated object (which would normally have been cached and would be available in hydrate() calls after the first call).

WanWizard commented 7 years ago

Thinking about this, I believe that this is the key issue:

However, with from_cache(false), nothing is saved to the cache,

This is wrong, as the method name implies, this is about retrieving a cached version. It should not say anything about whether or not the result should be cached. So the root cause is a single flag used for two different purposes.

So in https://github.com/fuel/orm/blob/1.9/develop/classes/query.php#L1431, $this->from_cache should not be used, but a second flag is needed here, to specifically indicate you want the result cached.

I think that, in your test, if you remove $this->from_cache from the Model::forge() call, the problem should be solved too?

WanWizard commented 7 years ago

I'll try to find some time later today to setup some tests myself.

Josh-G commented 7 years ago

I think that, in your test, if you remove $this->from_cache from the Model::forge() call, the problem should be solved too?

I tried this but it has no effect on the issue as the object is still discarded.

However, with from_cache(false), nothing is saved to the cache,

This is wrong, as the method name implies, this is about retrieving a cached version. It should not say anything about whether or not the result should be cached. So the root cause is a single flag used for two different purposes.

The problem is that the singular relation of A->B results in: $relation_result_wrapper->data = null or previous object in the inner hydrate() call and then the object created is never stored in $result->data as the conditionals that govern if to store the object or not do not store $obj if $result is a single relation that already contains some data. This causes the hydration to end with only the first hydrate call that created B, and any relations that were created B->C are lost as the updated relations/attributes for B are not related back to A.

The patch will check to see if a hydrate() call has already created the B object prior to trying to forge() it again, and then adds the C objects to the existing B object, instead of the current behaviour, which is to forge B again and then discard it.

Josh-G commented 7 years ago

So in https://github.com/fuel/orm/blob/1.9/develop/classes/query.php#L1431, $this->from_cache should not be used, but a second flag is needed here, to specifically indicate you want the result cached.

In this scenario, I am using ->from_cache(false), as I wish for the ORM to not use any cached objects when fulfilling my get() request, nor do I wish it to cache the final hydrated model. I want it to fetch everything straight from the DB, perform hydration and hand it straight to me. As in, completely skipping $_cached_objects.

This is my understanding of how from_cache(false) would work.

WanWizard commented 7 years ago

That is how it is currently implemented, yes.

WanWizard commented 3 years ago

This issue has been adressed with the rewrite in 1.9/dev.

When a model object is retrieved from cache, it will still recursively apply the rest of the retrieved data, creating missing related objects when needed.