KevinJump / uSync

Database syncing tool for Umbraco CMS
https://jumoo.co.uk/usync/
Mozilla Public License 2.0
111 stars 63 forks source link

References are not stored during import. #667

Open a-t-k opened 2 months ago

a-t-k commented 2 months ago

*Describe the bug When importing for the first time, references are not stored. Only a message is logged. To enable this, import must be done repeatedly until no more messages appear in the log

This ist the Message in log: The reference to umb://document/81a6be4fb73d448e9b14e6622e457f17 can not be saved as relation, because doesn't have a node ID.

ActionName: uSync.BackOffice.Assets.Controllers.uSyncDashboardApiController.ImportHandler (uSync.BackOffice.Assets) RequestPath: /umbraco/backoffice/usync/usyncdashboardapi/importhandler SourceContext: Umbraco.Cms.Infrastructure.Persistence.Repositories.Implement.DocumentRepository

To Reproduce Steps to reproduce the behavior:

  1. Export Complete from existing umbraco instance
  2. Start new instanz of umbraco (create new empty database and change con.string)
  3. start to import exportet content into empty database.
  4. anfter import log is full of messages.
  5. importet content has no references.

Expected behavior All references that cannot be created because the end-content has not yet been created, collect and only create at the end of export. If an issue then occurs because something does not exist, write a message to log and somehow notify the user.

Screenshots If applicable, add screenshots to help explain your problem.

About your Site (please complete the following information):

Additional context nothing

KevinJump commented 2 months ago

Arrggg. This is probably going to be a WillNotFix, but i will give you context (and a bit of a rant) as to why.

uSync isn't storing or setting these references, it's done inside Umbraco when the content is saved. As such it makes it ~really hard~ possibly impossible to fix when you have a single pass import.

In the early usync days we went to practicality over performance, and we imported content in a two-pass process, this wasn't super-fast but it means when you import something, and it then relies on something else you don't have to worry too much because the second pass is-a-comin. and it will resolve all of that for you.

some time ago, we got grown up, did a lot of work on dependency graphs, delayed notifications and all sorts of trickery which means we can (and do) push content through on a single import, a lot less database hits, a lot quicker importing (which is sort of uSync's thing, it doesn't, like, take days!).

One of the things Umbraco introduced that helped us here is delayed notifications: that is with some clever code you can change how Umbraco's scope notifies other things when stuff happens, so we can import everything and then hold back the notifications till the end and fire them all off (in the background if we want to, it makes it clean and fast(er)).

but! - for reasons i don't know (and I will ask someone) - The relations are not handled by a notification but rather baked directly into the save / persist methods of the content service 🤯 - I am not sure, but really it feels like this should be a notification, if it was a notification we could (and would) fire them at the end, the relations would therefor happen after all the content has been added and no issues would arise.

So why we can't/probably won't fix: Because this happens when the content is saved, the only realistic way to stop it is to make sure all other content that is references is added before the piece of content that adds it.

except that isn't always possible :- for example the umbraco starter kit, the homepage references the products, and the blogs, but the products and the blogs don't exist, and they can't be created until the homepage exists as it's their parent node, so even if you were to build the dependency graph which included references, you would get circular dependencies and it would all fall apart.

the other option is to go back to a two-pass import, to be clear, this wouldn't remove the log entries they would still happen on the first import, but it would be like you saved everything twice and the second time it should fix the dependency,

but because the relation method is inside the persist - you would have to save/publish the second time even if there are no changed. This would increase the number of versions of content for every import - it would also likely double the import time.

Also. As this code is buried in the repositories. There doesn't appear to be a way just to ask umbraco to build these relationships. so we could import and then say "OK do the relationships now" - this might be the way out, so maybe we could replicate the code from the core in our own thing and run it post import, but i would need to check - see if that does anything odd.

but at the moment for v13 at least. not really got a fix :(

KevinJump commented 2 months ago

Umbraco discussion - just to see if there is a logic i am missing somewhere. https://github.com/umbraco/Umbraco-CMS/discussions/17021

a-t-k commented 1 month ago

Hello Kevin, thank you for the detailed information. As I understand it, the reference storage only occurs when both elements are available. The error message in the log is written by Umbraco itself if the reference cannot be created. Therefore, to ensure that all references have been captured, everything needs to be saved again. Okay, it can also be checked whether there are any imports without references and these can be skipped, but fundamentally, this could indeed mean double saving.

In our case, we need the references to determine how often and where the referencing is going. Therefore, we need to perform the import multiple times so that Umbraco can create all the references. Umbraco itself also has a view that displays these references, and this view will be empty if only one import has been performed.

Now we have two options: 1) It will be integrated into uSync, where the references are created at the end of an import. 2) We will create the references ourselves after the import by uSync.

And of course, I would personally be even more pleased if this option is integrated into uSync.

a-t-k commented 1 month ago

Ah okay, I'll check your post on the Umbraco forum instead.

KevinJump commented 1 month ago

Hi,

And of course, I would personally be even more pleased if this option is integrated into uSync.

yes me to :)

As it stands there are no public methods in umbraco to run the reference generation method on a content :( i think if that at the very least existed we could 'just' run it at the end, and all the references would be updated. but at the moment its not there :( .

a-t-k commented 1 month ago

Hi Kevin, what if uSync could export existing references as a separate module? After the standard content import has completed, the references can be created subsequently when the corresponding IDs are available. Since the IDs are GUIDs, this shouldn't be an issue. The only drawback is the large number of additional files that need to be created, but this would solve the problem entirely. I've also seen a BULK Insert feature that allows you to populate the reference table in one go or in batches

KevinJump commented 1 month ago

You can already do that - the existing relations type handler will export relation types but also the relation items if its configured too.

at the moment it doesn't sync the inbuilt relations because for the exact reason that we don't want to duplicate / conflict with waht umbraco is doing.

but you can tell it to export the inbuilt relations too by clearing its exclude value 👍

"uSync" :{
    "Sets": {
        "Default": {
            "Handlers": {
                "RelationTypeHandler" : {
                    "Settings" : {
                        "Exclude": "",
                         "IncludeRelations": "True"
                    }
                }
            }
        }
    }
}

This won't result in thousands of extra files the relations will be stored within the relation type file, but it might be slower / i can't guarantee you won't get duplicates etc.