OrchardCMS / Orchard

Orchard is a free, open source, community-focused Content Management System built on the ASP.NET MVC platform.
https://orchardproject.net
BSD 3-Clause "New" or "Revised" License
2.37k stars 1.12k forks source link

TaxonomyField update shouldn't use the database unless necessary #8704

Open BenedekFarkas opened 1 year ago

BenedekFarkas commented 1 year ago

When a content item is saved that has TaxonomyFields, TaxonomyFieldDriver will call TaxonomyFieldService.UpdateTerms (unless there's a validation error). In that function, all the TermContentItem entries associated with the field are dropped and then new ones are created based on the Terms selected in the editor. However, this happens even if the selected term(s) didn't change.

Suggested changes:

  1. TaxonomyField should store the list of its selected term IDs serialized in an infoset property. This would help, because when you already have the content item/part in memory, we don't need to go to the DB to find out which terms are selected.
  2. When UpdateTerms is called, the list of already selected terms IDs is compared to the would-be term IDs, then:
    1. Do nothing, because they contain the same IDs.
    2. Some were deselected, so remove those TermContentItem entries + remove them from the infoset list.
    3. Some new ones were selected, so add those TermContentItem entries + add them to the infoset list. (this can happen together with the previous one)

What do you think? CC: @MatteoPiovanelli-Laser @sebastienros

MatteoPiovanelli-Laser commented 1 year ago

I'll do a bit of a stream of consciousness as I try to remember how TaxonomyFields worked, by putting here information I'm gleaning by re-reading the code.

Those IDs (for the terms selected in a TaxonomyField) are NOT already serialized (for either published and latest versions of a ContentItem) in the StringFieldIndexRecord, that is used for queries/projections. OnLoading the terms in the field are populated (in its LazyField) off the records for the TermsPart for the ContentItem. Those records are also used for projections. This is unlike, for example, ContentPickerField.

ITaxonomyService.UpdateTerms: given the list of terms we selected, we want this method to make sure that the TermsPart for the ContentItem we are processing ends up having, for a specific field, only TermContentItems for those terms. Moreover, this method should take all the terms that were unselected to fire off a processing task. Currently, that doesn't work properly, to be entirely honest:

It's possible that one way to fix this would be to make the TermsPartRecord a ContentPartVersionRecord. That would also take care of the need to keep track of the changes to the list of terms in the UpdateTerms method, since we'd have new records for the new version. This wouldn't be adding more stuff to the DB than it is now on update, but it would not delete TermContentItem records attached to previous versions: fewer operations, but in the end a larger table. It would complicate the step done OnPublished to call the task that counts selected ContentItems for each term, but only slightly: we would compare the current version with the previous one to also account for unselected terms, rather than have a separate invocation of the task in the UpdateTerms method. That "new" versioned part could on principle also have the Ids in its infoset, but it would have to query for all corresponding records anyway, and we wouldn't need to compare the list of Ids to a previous version.

BenedekFarkas commented 1 year ago

I wasn't really considering versioning, but I agree that it's weird that saving a draft item with different terms selected is immediately reflected in the published version as well.

My immediate concern here is that DB changes happen even when they aren't needed: When you save a Taxonomy Field without changes, the DB operations play out as if you unselected all the terms, saved it and then re-selected the same terms and saved it again, which is wasteful.

MatteoPiovanelli-Laser commented 1 year ago

Yes. On the other hand, if we moved to a versioned record, that's sort of the default behaviour: the records for the new version, if no change has happened, end up being duplicates of the previous ones (except for the relevant Ids). But we would not be deleting a record just to add it back.