OrchardCMS / Orchard

Orchard is a free, open source, community-focused Content Management System built on the ASP.NET MVC platform.
https://orchardproject.net
BSD 3-Clause "New" or "Revised" License
2.38k stars 1.12k forks source link

Importing of data unusable with any good amount of users / not always working #2606

Open orchardbot opened 12 years ago

orchardbot commented 12 years ago

AimOrchard created: https://orchard.codeplex.com/workitem/18779

2 issues

I made a query, exported all queries, removed the query, imported the exported data, and the query didn't reappear.

But the biggest issue is that the import is not usable if you have plenty of users.

I have 1000 users in my test database, and with just this, more than 4000! queries were executed before I had to pause the profiler because the one I use cannot handle this amount of queries.

You can see a stack trace of one of the queries (that is executed per user for some reason) @

https://dl.dropbox.com/u/23877279/Permanent/Screenshots/Bugs/import_fail_1.png https://dl.dropbox.com/u/23877279/Permanent/Screenshots/Bugs/import_fail_2.png

Both are also included in the attached zip (import_fail.zip)

orchardbot commented 12 years ago

AimOrchard commented:

In addition, I'd like to ask if you could tell me what I need to do (sql-wise) to remove all users except admin.

I want to get rid of our test users so we can (for now) use the import/export function until a valid solution / fix is found for this issue.

orchardbot commented 12 years ago

AimOrchard commented:

Ok, export is also terrible...

var contentItems = _orchardServices.ContentManager.Query(options).List();

You fetch ALL content and THEN exclude the types you don't need... I have in my test database ~6k users... Meaning ALL of them are fetched, including all their linked stuff (like roles)

To export ~4-5 valid items, it took 6102 queries!

orchardbot commented 12 years ago

AimOrchard commented:

Ok I tried to fix it myself, I failed... Could you please look asap at this? I mean, with this bug this feature is pointless if you have any good amount of content (users or 'real' content)!

orchardbot commented 12 years ago

AimOrchard commented:

Ok digged further and I found a way to improve the query (so it 'only' queries ALL 'relevant' items, still bad if you have plenty of the requested content imho)

In the ImportExportService ExportData method I now have this:

    private XElement ExportData(IEnumerable<string> contentTypes, VersionHistoryOptions versionHistoryOptions) {
        var data = new XElement("Data");
        var options = GetContentExportVersionOptions(versionHistoryOptions);
        var contentTypesArray = contentTypes.ToArray();
        var contentItems = _orchardServices.ContentManager.Query(options, contentTypesArray).List();

        foreach (var contentType in contentTypesArray) {
            var type = contentType;
            var items = contentItems.Where(i => i.ContentType == type);
            foreach (var contentItem in items) {
                var contentItemElement = ExportContentItem(contentItem);
                if (contentItemElement != null) 
                    data.Add(contentItemElement);
            }
        }

        return data;
    }

Notice that I now supply 'Query' with the list of requested content types so that it can do the filtering @ database-side. It needs some cleaning up, but with this export was instant now without all those thousands of queries.

Issue with import still remains though, investigating bit further but no promises.

orchardbot commented 12 years ago

AimOrchard commented:

So yeah, the problem lies in ImportContentSession.Get since it looks like that it goes (can go?) through all content items.

orchardbot commented 12 years ago

AimOrchard commented:

So any feedback on this?

orchardbot commented 12 years ago

@bleroy commented:

If I understand it correctly the problem is that we need to compare identities with all content items when importing, to find if the item being imported exists already. Identity being a collaborative process that we cannot make assumptions on, there isn't a good solution to this problem that we know of at this point. Harder problem than it seems.

orchardbot commented 12 years ago

@bleroy commented:

Thanks for the suggestion though, I think we should reevaluate that and at least apply some strategic optimization that do some partial filtering ahead of time. Re-opening for new triage.

orchardbot commented 12 years ago

AimOrchard commented:

Well, some database-side filtering would be nice to start with

Bbut in addition to that, if you cannot escape the fact that all the content HAS to be retrieved, it would be nice to split all required actions up in batches.

You could split up both import / export into batches and give the admin a visualization of the current progress and the ability to cancel (and if doable, the ability to pause / resume a batch)

In addition to that, you could already use the improvement I mentioned with exporting content (only query for content items of the requested type)

Another thing you could add is the ability to 'skip' the 'does content exist' check and just import as-is.

No point in checking all content items if the person doing the import 'knows' that none of the content that is being imported already exists.

orchardbot commented 12 years ago

@bleroy commented:

Sure would be nice (except for the part about skipping identity verification).