blacksmithgu / obsidian-dataview

A data index and query language over Markdown files, for https://obsidian.md/.
https://blacksmithgu.github.io/obsidian-dataview/
MIT License
6.82k stars 404 forks source link

Serious Issue: High CPU Consumption #1280

Open tecnoborder opened 2 years ago

tecnoborder commented 2 years ago

What happened?

I did several tests checking the CPU consumption. I have a MacBook air intel late 2019. In my vault there are around 9k notes and 46 plugins. During the load (without dataview), the CPU consumption goes up to 140%. After 30 seconds that obsidian is loaded, it goes to 5% even when using obsidian (creating notes, navigating it). With dataview enabled it is a nightmare. The CPU consumption goes up even to 199,6%, the fan of my mac keeps working, obsidian and my mac are barely usable (lags), CPU consumption is consistently up to 100-120% or more. I also tried to uninstall and reinstall the dataview plugin but the problem persists.

DQL

No response

JS

No response

Dataview Version

latest

Obsidian Version

latest

OS

MacOS

AB1908 commented 2 years ago

Can you export the performance profile from Obsidian? It has been described in a few other issues but let me know if you need guidance on that.

tecnoborder commented 2 years ago

Profile-20220723T224622.json.zip

tecnoborder commented 2 years ago

So, I started the recording with the plugin disabled, after a few seconds I enabled it and left it enabled for the remaining part of the recording

Oblique82 commented 2 years ago

I am also experiencing this

blacksmithgu commented 2 years ago

It is possible that large vault size is the issue, but if Obsidian is persistently unusable that has historically pointed to bad plugin interactions.

If you copy paste your vault notes (or some sizable chunk of them) into a new vault with ONLY dataview installed, do you still get persistent and chronic lag?

tecnoborder commented 2 years ago

Hello, thanks for the reply. I repeated the test, coping all the notes in a new folder, and then installed only the data view plugin. Unfortunately, the problem persists. The CPU consumption went from 48% to 150% when I enabled the plugin in the new vault. I record the profile as before: in the first 1-3 sec data view is disabled then it is enabled for the rest of the recording Profile-20220726T083448.json.zip

tecnoborder commented 2 years ago

I repeated the test and the cpu went down. At this point, I guess there is a bad interaction with other plugins. Thing is, in my main vault when I disable all the plugins, restart obsidian, and enable dataview it becomes immediately unusable.

tecnoborder commented 2 years ago

the problem might be not the size of the vault but some plugin interaction. I will create a new vault and install dataview first and try each plugin I have one by one. Could it also be related to some Obsidian caching problem? Old plugins I uninstalled still causing issues

Oblique82 commented 2 years ago

I tested this with a new Obsidian install on MACOS - after nuking my users/library app support folder to make sure there were no residual settings, fresh install to current insider version.

I am having 100+% CPU usage with dataview as well when no other plugins are running

I have about 20 000 notes (eg Greek and Hebrew lexicons)

blacksmithgu commented 2 years ago

I took a look at the performance traces - thank you for taking them. It looks like the initial lag is due to Dataview doing the initial vault index. Dataview does a full-vault scan (load each markdown file, parse for metadata, store to a persistent cache) on first usage. If you have issues with first-time Obsidian setup (where Obsidian metadata cache first-time load takes 30 seconds) then I can see Dataview being even worse, since it is caching a fair bit more data.

I suspect if you just let Obsidian sit there for N minutes, it would eventually clear up and performance would return to normal, though this is not an ideal state of affairs.

The main question here is how to broadly improve performance and interactivity. There are a few options:

I think this also neccessitates me revisiting some cache invalidation logic so that you do not need to re-index on every dataview update (as is currently the case); but rather only on every dataview minor version bump. Also, some kind of debug metadata which tracks indexing performance would also probably be helpful for better understanding "just how fast/slow is indexing?"

tecnoborder commented 2 years ago

Thanks for taking the time to review this issue! I would probably propose options 1 and 2 combined with a suggested CPU consumption mode by default (e.g. 40%) I would also add a pop-up message (for the first start-up) saying that the indexing of large vaults can lead to high CPU consumption.

Additionally, I would like to understand if the indexing requires a lot of CPU data as it looks both for data view queries and tags (e.g. book in the frontmatter). I am asking this because I have very few notes with data view queries and another option could be to index only a folder with these queries (e.g. the same happens for templater where you can specify a folder where the templates are).

I was also wondering if you could allow people to manage the indexing process. Eg. manual index or scheduled (e.g. every night after 22, weekly). S,o instead of data view indexing all the vault every time it could rely on a previous index and be updated in a more controlled way

AB1908 commented 2 years ago

I am asking this because I have very few notes with data view queries and another option could be to index only a folder with these queries (e.g. the same happens for templater where you can specify a folder where the templates are).

If my understanding is correct, this shouldn't be a concern as queries render from the index. The query itself takes a few hundred milliseconds at worst and the render can be lengthy depending on the result set. These two are entirely separate from the indexing itself.

Oblique82 commented 2 years ago

My iPhone vault freeze and the Obsidian app crashes within 10 seconds.

could a workaround (in Considering your above comments) be to run indexing on my Mac for n second until it competes - and is this index synced to iOS or does each device have to run its own index?

AB1908 commented 2 years ago

Indexes don't sync I think.

blacksmithgu commented 2 years ago

I don't think scheduled indexing is that important - once the vault has been indexed, Dataview can keep up with incoming new changes pretty efficiently and it can be configured to throttle how quickly it refreshes file changes.

I'll see how hard it is to add an option to configure %cpu load as well as a little page for recording how long the index is likely to take.

Oblique82 commented 1 year ago

Just revisiting this issue - as a large vault (20 000 notes due to lexicon/dictionary) - can I please suggest implementing a mechanism to exclude certain directories/files from dataview indexing at startup?

I really love you work and I am just looking at scalability of this especially on iOS

I don't need dataview to index 20 000 dictionary files that I don't run queries on - but I would love to use dataview for my other personal notes

Will this help with startup CPU usage time? my performance is maxed at 100% for nearly 10 mins (imac Pro 2018)

blacksmithgu commented 1 year ago

I understand frustrations with index time performance - I'm working on a fix which allows you to configure how fast dataview indexes, and throttles it in the background so that it stops consuming 100% cpu (more like 20%) in exchange for initial indexing taking longer. Additionally, dataview will refresh the cache less often (it currently does it on every new version which is probably too frequent).

Obsidian has the concept of "Excluded Files" which dataview will be respecting for it's own indexing in future versions; would this be sufficient, or should Dataview have it's own separate concept of "excluded files"?

Oblique82 commented 1 year ago

Thanks for your reply.

I think for the purposes of indexing a large number of reference files (like dictionaries), it makes sense to allow Dataview to customise what the plug-in excludes.

Excluding files completely from obsidian affects normal search results and other things. Having Dataview make this customisable would give the best performance benefits and allow us to maintain other usability features

This is similar to the preference @tecnoborder mentioned above

Hatweeo commented 8 months ago

I also experience performance issues when using Obsidian together with dataview- and task- plugin. It's seriously lagging everytime I try to add or change a task: very annoying and time consuming especially when I'm taking notes in a meeting. I also see an increment in CPU activity if this happens: see attached picture. When I restart Obsidian things are getting better for a little while: after that problem returns. ![Uploading IMG_3450.JPG…]()

Hatweeo commented 8 months ago

Perhaps this is a result of working with flat files? But the plus of that is that can install it locally. Is not permited by my employer to use e.g. a cloudbased tool with another database structure. I would love to use a note taking app together with task management features of f.e. Remember The Milk.

Hatweeo commented 8 months ago

But I love the concept of Obsidian and both plugins!