directus / v8-archive

Directus Database API β€” Wraps Custom SQL Databases with a REST/GraphQL API
https://docs.directus.io/api/reference.html
505 stars 204 forks source link

ItemsService extremely slow for large datasets (M2M / O2M) #998

Open pikzelz opened 5 years ago

pikzelz commented 5 years ago

Bug Report

When working with items that contain a large dataset in a M2M or O2M relation the api (and app) will be very slow in loading the post (if loading at all). We utilize Directus to feed certain apps we develop. As example we use Directus to record users' push token information and to schedule push notifications in a O2M fashion by using a webhook to attach all targeted users to the notification. However whenever we need to alter the push notification after the webhook attached all the users the item on the api will occasionally timeout and report back with an API error.

Steps to Reproduce

  1. Create two collections (Users / notifications)
  2. Insert more than 5000 rows in the users collection
  3. Create a notification, have a webhook attach all users on create
  4. Try to open the post on the API

Expected Behavior

The post would open fairly fast

Actual Behavior

Item might timeout and report an error.

Other Context & Screenshots

When I tried refactoring it by omitting the ItemsService and using old fashion queries I was able to load the items very fast (less than a second), however this didn't work with the App. I feel like the issue might be related to the fact the ItemsService creates multidimensional object out of the rows.

Technical Details

itsmerhp commented 5 years ago

Hello @pikzelz I have followed the steps you have mentioned:

Created two collections Related those with O2M relation Inserted more than 5000 records in main collection and 2 related records for each in related collection Insertion and fetching of those records were not slow, please check attached screenshot

I think the timeout issue is due to push notification code in webhook which trigger on insertion of each record. Can you please comment the push notification code in webhook and check again?

pikzelz commented 5 years ago

Hello @itsmerhp,

The webhook is attached to creation of a notification, not to the creation of all the links.

For linking all the users to the message I take the id from the data object within the webhook and omit the ItemService by using an old fashion mysqli object and running a prepare statement that resolves to the query: INSERT INTO users_notifications (user_id, message_id) SELECT id, 1 FROM users this takes less than a second.

After that I check the database to verify if all users are linked, then I try to open the post within the APP.

itsmerhp commented 5 years ago

@pikzelz Are you in slack? Let's discuss there in more details.

pikzelz commented 5 years ago

@itsmerhp, not at this moment but I could join the channel. So we can see if we could figure this out.

benhaynes commented 5 years ago

https://directus.chat

itsmerhp commented 5 years ago

@pikzelz Please join Directus slack channel with the link provided by @benhaynes and you will find me in it by "Rahul Patel".

pikzelz commented 5 years ago

I joined, sorry for the delay. Let's try to figure this out.

itsmerhp commented 5 years ago

I have discussed this issue in slack with @pikzelz and found that he is facing issue in fetching records and it is because of a large number of main and related records in APP(For each notification record thousands of records are in junction table relating with users). In API one can manage up to which level and which fields of each level(using ?fields=name,title,category.*) should be retrieved in response but in APP all the fields of all the related collections are returning. Therefore, this issue is not of API and it seems a new feature request in APP.

@benhaynes, @rijkvanzanten and @bjgajjar Please share your thoughts on it.

rijkvanzanten commented 5 years ago

The app needs the nested data to be able to display the correct data on the tabular views. We should figure out what the bottleneck is in the API.

When I tried refactoring it by omitting the ItemsService and using old fashion queries I was able to load the items very fast (less than a second).

It's not the fact that the dataset itself is that huge. Something in the API is slowing it down to a halt.

itsmerhp commented 5 years ago

I have discussed with @rijkvanzanten regarding this issue and to overcome this issue we do have to provide an option to limit sub-collections records as mentioned in #1105

Meanwhile, I have asked @pikzelz to provide SQL dump in slack private channel, so can debug and apply further optimization if needed in API.

benhaynes commented 5 years ago

This is not only for large datasets. I have a table with 30 items, each of which has several relational fields... and loading the Browse page is 8-10s! Fetching the same data using raw SQL select and joins takes about 50ms, so there must be something in the api causing a bottleneck.

@directus/api-team this needs to be a priority, as it currently makes some pages unusable in the App. I can provide a database dump (privately on Slack) where this can be seen/tested/optimized.

binal-7span commented 5 years ago

Hey @benhaynes

I have gone through the DB which you have been sent in the slack conversation to replicate this issue. On quick research, I would say that it's an issue regarding the huge amount of content.

I can provide you a brief here. Below are the statistics for *.*.* endpoint.

DB Size Server Response Time Content Display Time SS
Original DB 18 MB 2.05 Second 28.40 Second 69170959_881690995543212_6744149448533737472_n
200 Character Content DB 547 KB 728.87 MiliSecond 628.42 MiliSecond 69419298_479586489259857_6854773791600410624_n
10 Character Content DB 244 KB 862.07 MiliSecond 3.45 MiliSecond 69676011_673134766527681_4151612106682138624_n

So, Basically server responded within a proper time but it took time to display that much data. Directus is stable; it's the issue of user data.


If the user has these much data in their DB then below are some solution which they can implement to overcome this issue.

pikzelz commented 5 years ago

@bjgajjar the issue itself is not necessarily with a user fetching data. When I'm fetching data I request specific fields like you mention, however, the App counterpart always used *.*.* to fetch data, this causes a super long loading time when you have a lot of data.

When you directly fetch from the database and don't go through the ItemsService and just do a basic fetch_assoc (or fetch_array) it renders very fast. However when it travels through the ItemsService it tends to get slow on big heaps of data.

I've unfortunately still been unable to fully pinpoint the issue as I'm currently slammed with work, but whenever I find some more time I'll try to dig in further to give you an even more detailed example.

Edit: I just noticed the changes @rijkvanzanten did to the APP and see if it improves certain parts of the big data collection for me. I will still investigate and try to find any of the bottlenecks if they are still persistent.

binal-7span commented 5 years ago

@pikzelz - The issue is loading a large amount of data and not the server related. As mentioned above server sends the response on time but network took time to render it.

pikzelz commented 5 years ago

@bjgajjar I saw other statistics, for me the time to load the data from the server would take very long and eventually timeout. Once the data was actually loaded, it was rendered within miliseconds.

With the changes @rijkvanzanten made to the APP I see a better response time. Some still take some time but I haven't encountered a timeout.

binal-7span commented 5 years ago

@pikzelz - System took milliseconds to fetch the data(SQL) from the server as well as render the data. But network(Get the data from server to client) took time to pass on the data.

69574371_939599139707152_291485660714369024_n

benhaynes commented 5 years ago

This assessment makes sense, thank you @bjgajjar! Just brainstorming a few things below...

  1. We can't ask users to change their content size (how much data is stored) or schema naming (fields/collections).
  2. We can't limit data when a developer requests it through the API (since they need access to the full data requested).

Limit Data Response Based on Param

But the Directus Admin App Listing page most likely doesn't need HUGE amounts of data (like long WYSIWYG data with embedded images) for content previews. Would it make sense to have an optional parameter for API calls that limits the amount of data returned? Maybe there's a way to do this at the FIELD level instead of globally (is that better or worse?).

https://api.directus.io/_/items/projects?truncate=200 This would shorten all string (and maybe json) fields with substr to the provided char limit. Then we could have the Admin App use this param for all Layout requests to speed them up. So if you had a lot of long form text (like an article) you would get something like these 200 chars as a preview:

<p>Before you can experience all of the awesome features within Directus 7, you first need to install it. In this article, I’ll show you how to get started with Directus.</p><blockquote>This tutorial

The App could remove all HTML tags from this response to clean it up.

Based on your tests, this should get that example request from 18MB to 500KB... a huge improvement. The only issue I can think of is that I think the App stores this Layout data and uses it again in the Detail (edit) mode too... and we don't want to mess that process up. @rijkvanzanten πŸ””

Async Relational Data

Assuming most of this is relational data, could we have the Admin App load top-level (parent) data in the listing view, and then asynchronously load relational data?

Optimize Fields Requested

Could we update the API requests for the Admin App Listing page to be more specific than ?fields=*.*.*? Does it already limit by requested fields? Is there any room to optimize this?

benhaynes commented 5 years ago

Also, I want to clear up: this appears to NOT be an issue for normal API calls, since if you request a 18MB of data, it will take a while to get to you. And you have the ability to control which data you need to make the response smaller.

This ticket is mostly about how we can speed up the Admin App β€” since you shouldn't have to wait 15 seconds to see a list of blog posts.

rijkvanzanten commented 5 years ago

There's a lot to optimize on the app side for this. Right now it fetches everything 3 levels deep, which is highly unoptimized. I don't think this is an issue in the API necessarily

binal-7span commented 5 years ago

@benhaynes

you would get something like these 200 chars as a preview

We can achieve this dynamically.

We should add one option in field for preview_length for all the fields. Detail API will have whole data, but for the listing, the content will be truncated as preview_length.

rijkvanzanten commented 5 years ago

So like

/_/items/movies?fields=*.*&preview_length=200

??

That will return all the fields 2 levels deep, but will make sure that the value is never longer than 200 characters?

I like the idea, but I'm wondering how it'll work in practice. Certain values need to be parsed on the front-end (think relational / JSON) that will break with this potentially (if you truncate a JSON object to 200 chars, you can't parse it anymore cause it's invalid)

benhaynes commented 5 years ago

That's what I'm thinking, yeah.

Rijk, you're correct in that some values will "break" if they get truncated (eg: JSON). We can either truncate per field and have the App decide which to truncate (eg: only article_body), or choose which data types (eg: only text) will get limited.

These truncated values would only be used on the Item Browse page, not the detail page (for obvious reasons). So it might not be an issue for things like JSON if we changed how this tooltip works.

We'll need to think about what is important to see on the Browse pages. Do you need to see entire articles (probably not), do you need to see entire JSON objects (probably not), do you need all relational data (not all, but maybe some info on count or file previews)?

What I'm saying, is that we might want to rethink how we handle the Browse page β€” what we need to show. A tablular layout can use previews, but a chart layout might need more, un-truncated data. So perhaps this becomes an option within the layout β€” with a smart default (on/off/length) based on the layout type? That way the user can enable it if they're seeing a lot of latency... and we could even prompt the user to turn on the option after an exceedingly long Browse load.

binal-7span commented 5 years ago

@rijkvanzanten @benhaynes

image

This is what I meant. We can provide one option preview_length in field schema. If the user provides the value over there then API will truncate that field with that length. Otherwise, it will return the original data and this will not break the JSON too.

benhaynes commented 5 years ago

That could work, though we would definitely want to add it under "Advanced".

I assume the App would still need to have a flag when requesting data from the API for the listing?

Do we need this level of granularity in the App Settings, and in every field? Does it make more sense to have a global preview_length? Or maybe a preview_length_text and preview_length_json? Or should we just have a hardcoded number that will work 95% of the time (like 250 char, or something)?

I just want to avoid needlessly complex Settings and Field configuration if this will almost never change.

hemratna commented 5 years ago

Let's have preview_length as a config setting in api.php as this is easy to add and manage and works most of the use cases.

benhaynes commented 5 years ago

Thanks @hemratna β€” so this would be a single, global config value for all collections/fields that use the text (and maybe json) types?

The App would use a query string (eg: ?preview) to decide when it wants preview data for a page?

@rijkvanzanten β€” does this approach work for you?

hemratna commented 5 years ago

It will be single, global config used for only text.

API will not have control over length of json.

benhaynes commented 5 years ago

I wonder if we should have an option to "preview" json in case we have a similar size issue with that type in the future.

If the JSON data is huge, we could have a preview that reduces it to a count of how many root items were there (see below). Maybe this is a horrible idea, but I'm trying to think about how we could solve the issue of enormous JSON data in the future (which would have the same issue we're solving for TEXT right now). At least having a count would let the App show some info on the listing page.

{
    "count": 95
}

The JSON can be moved to a separate ticket, but maybe we would want to "scope" our preview flag to "TEXT": preview_length_text... that way we could have similar options for other types in the future?

rijkvanzanten commented 5 years ago

Why make this a global config flag? Let's add it to directus_settings, not the config file so it can be configured through the app.

I don't know how useful this feature is if it only truncates string type fields. I think that JSON values are generally speaking the heaviest.

benhaynes commented 5 years ago

I agree β€” we should keep as much as possible in the database (settings), and minimize the config.

Rijk, did you like my idea about scoping these to text and json?

benhaynes commented 5 years ago

Also, Rijk is thinking that we should use a query/flag named after truncate instead of preview since "preview" might be used for actual previews later. Might be confusing.

He also mentioned that we should have the App request this IN the query parameter, instead of an API config/setting:

?truncate_text=250

That way the App (and API user) has more control over what those reduced data responses are. I like that.

justrealmilk commented 4 years ago

I'm having the same issue but I've omitted any large text blocks. I just have a handful of relational tables

image

The response is only 55KBs (gzipped)

This is the endpoint itself, lightly obfuscated. Admittedly, I have had no idea what I'm doing as there's no docs for the ItemsService

<?php

use Directus\Application\Http\Request;
use Directus\Application\Http\Response;
use Directus\Services\ItemsService;

return [
    '' => [
        'method' => 'GET',
        'handler' => function (Request $request, Response $response) {

            $itemsService = new ItemsService($this);

            $a = $itemsService->findAll('collection',
              Array(
                'sort' => '-created_on',
                'status' => 'published',
                'fields' => Array(
                  'id',
                  'content',
                  'created_on',
                  'modified_on',
                  'slug',
                  'name',
                  'summary',
                  'tags'
                )
              )
            );

            $b = $itemsService->findAll('collection',
              Array(
                'sort' => '-created_on',
                'fields' => Array(
                  'id',
                  'content',
                  'created_on',
                  'modified_on',
                )
              )
            );

            $c = $itemsService->findAll('collection',
              Array(
                'fields' => Array(
                  'id',
                  'cover.id',
                  'cover.private_hash',
                  'cover.width',
                  'cover.height',
                ),
                'limit' => -1
              )
            );
            $d = $itemsService->findAll('collection', 
              Array(
                'fields' => Array(
                  'id',
                  'fid',
                  'field.id',
                  'field.private_hash',
                  'field.width',
                  'field.height',
                  'background.id',
                  'background.private_hash',
                  'background.width',
                  'background.height',
                  'related.*',
                  'things.thing_things_id',
                  'things.thing_things_id'
                ),
                'limit' => -1
              )
            );
            $e = $itemsService->findAll('collection', 
              Array(
                'fields' => Array(
                  'id',
                  'field.id',
                  'field.private_hash',
                  'field.width',
                  'field.height',
                ),
                'limit' => -1
              )
            );

            return $response->withJson([
                'a' => $a,
                'b' => $b,
                'z' => [
                  'c' => $c,
                  'd' => $d,
                  'e' => $e
                ]
            ]);
        }
    ]
];