blacksmithgu / obsidian-dataview

A data index and query language over Markdown files, for https://obsidian.md/.
https://blacksmithgu.github.io/obsidian-dataview/
MIT License
6.94k stars 411 forks source link

Dataview query with flatten fails on update to Obsidian 1.03 and Dataview 5.47 #1553

Open aubreyz opened 1 year ago

aubreyz commented 1 year ago

What happened?

I have a vault with hundreds of queries exactly as the below - listing tasks in other notes with links to the current file. These worked perfectly well, but now fail on update (of both Obsidian and Dataview). On going into read mode I simply get a pause of about 2 minutes followed by a blank obsidian screen.

I can't see any reference to a query syntax change that would have broken this, and have no idea how to start debugging it.

To Add: a) I have tried in a completely clean vault with the same plugins and just a single task and a single query file - the query works fine b) Other dataview queries work fine, it is just this specific query that fails always c) It definitely has something to do with the Obsidian/dataview update as that is exact when it stated failing d) I presume it is some sort of cache issue? But what sort of issue would cause only one type of dataview query to fail. Is there some way to clear/reset the cache to check whether that might be a cause, short of copying all 1000 or so notes I have into a clean vault?

DQL

list WITHOUT ID tasks.text + " (" + link(file.link, Title) + ")"
flatten file.tasks as tasks
where contains(tasks.text, "[[" + this.file.name + "]]")
SORT tasks.text ASC

JS

No response

Dataview Version

0.5.47

Obsidian Version

1.03

OS

Windows

s-blu commented 1 year ago

I tried to reproduce out of curiosity and while it works on the example vault (275 files) and a small set of matching tasks (seven) it takes multiple seconds until the query renders for me. It seems like the performance dropped significantly for this use case, but I do not have any logs in the console. Might I ask how many files your vault has approx - 1000? - and how many results you would except as a rough number?

I am still on dataview 0.5.43 and obsidian 1.0.3 so this could be a obsidian caused issue, maybe.

aubreyz commented 1 year ago

I tried to reproduce out of curiosity and while it works on the example vault (275 files) and a small set of matching tasks (seven) it takes multiple seconds until the query renders for me. It seems like the performance dropped significantly for this use case, but I do not have any logs in the console. Might I ask how many files your vault has approx - 1000? - and how many results you would except as a rough number?

Yes it may be an extraordinarily slow query - but it used to be fast with exactly the same note dataset (albeit a few extra tasks might have been added).

I have perhaps 1000 notes each containing about 5 tasks on average. A typical query would pull up about 10 tasks at most.

But it is interesting that you are finding a slow render for this particular query even with a tiny dataset - so perhaps it is something that has been magnified by a recent code change?

aubreyz commented 1 year ago

To add: This simple query using the tasks plugin still works perfectly, and with almost instant rendering, so clearly the data is there to be queried...

```tasks
description includes [[Note name]]
```
aubreyz commented 1 year ago

I have now tried on a completely clean vault and on a different computer. The outcome is equally disappointing. Clearly there is a substantial new problem here.

Time to carry out render with Tasks Plugin - 3 seconds Time to carry out render with dataview using above syntax - 4 hours and counting...

blacksmithgu commented 1 year ago

I haven't been able to get multi-second lag by messing around with this locally on beta 0.5.50 - could you capture a performance profile (Ctrl+Shift+I to open developer window, then record using the "Performance" tab while the query runs)?

aubreyz commented 1 year ago

I get absolutely nothing in the performance tab -- it just hangs if the flatten line is there before there is any output.

It is definitely something to do with the flatten after the update.

The below without the flatten renders fine (albeit with a different outcome)

 ```dataview
 list WITHOUT ID tasks.text
 flatten file.tasks as tasks
 where contains(tasks.text, this.file.name)
 SORT tasks.text ASC
 ```
aubreyz commented 1 year ago

Would it help if I did a video capture of various dataviews with the performance tab running (ones which work and ones which hang).

aubreyz commented 1 year ago

This is what the performance profiler hangs on when the flatten line is there

image

aubreyz commented 1 year ago

I'm also not convinced it has anything to do with the tasks themselves either - or even the number of the tasks, but rather the structure of the query. For example even restricting the query to a folder where there are no results at all like below, hangs in the updated plugin/obsidian

 ```dataview
 list WITHOUT ID tasks.text + " (" + link(file.link, Title) + ")"
 flatten file.tasks as tasks
 where contains(tasks.text, "[[" + this.file.name + "]]")
 FROM "Temp"
 ```
s-blu commented 1 year ago

I created a branch on the example vault with test data: https://github.com/s-blu/obsidian_dataview_example_vault/tree/dataview-1553 and created a performance profile: Profile-20221109T220150.zip Rendering takes pretty much exactly 1 second. my previous try with different data (I scraped by now) felt longer, but I did not created a profile then and might be mistaken. Maybe it's the FLATTEN over the complete vault after all. Though, It does feel unresponsive in comparision to other queries.

I'm also not convinced it has anything to do with the tasks themselves either - or even the number of the tasks, but rather the structure of the query. For example even restricting the query to a folder where there are no results at all like below, hangs in the updated plugin/obsidian

 list WITHOUT ID tasks.text + " (" + link(file.link, Title) + ")"
 flatten file.tasks as tasks
 where contains(tasks.text, "[[" + this.file.name + "]]")
 FROM "Temp"

I think you need to place the FROM as first data command to take effect:

list WITHOUT ID tasks.text + " (" + link(file.link, Title) + ")" FROM "Temp" flatten file.tasks as tasks where contains(tasks.text, "[[" + this.file.name + "]]")

aubreyz commented 1 year ago

Thanks for your help Shifting the FROM higher up does not help matters

Have downloaded your sample vault (thanks) and will let you know what happens

aubreyz commented 1 year ago

OK, this is helpful maybe because I can demonstrate the problem on a modified version of the test vault you made: I carried out the following steps in order:

a) The test vault seems to be working (probably) OK. The first time the query took about 3 seconds, and after that was instant.

b) I then updated the plugin in the test vault from 0.5.43 to the current version 0.5.47 and with the test data it still worked OK as above.

c) I then expanded on the test data basically by replicating it perhaps 30 times in two of the existing notes - these are attached, and I get exactly the same problem as before..... never rendered and obsidian basically crashes after a few minutes (you will see that I also put in some very invalid markdown which may or may not be related, but may help to sort it out)

d) I then downgraded the plugin back to 0.5.43 (just by overwriting the plugin folder from the original test vault - but this may leave entrails and indexes elsewhere), and the problem persists with the two expanded notes.

e) I then reverted the two expanded test notes back to the versions in the test vault -- but the problem is NOT reversed, and it no longer renders (not sure if that is because indexing data is stored elsewhere).

Attached are the two expanded notes from the test vault

zipped bad files.zip

To add also - I flipped through all of the few dozen test queries in the sample database, and they all work rapidly and wonderfully. So it is just this particular query.

aubreyz commented 1 year ago

Further testing. This works fine pulling up every task in the altered demo vault as well as my own vault

 ```dataview
 task
 ```

and this works fine too

 ```dataview
 task
 WHERE contains(text, "[[Query me]]")
```

So the crash is very specifically related to the flatten, and not to something else like some rogue malformed task or a corrupted index

So this crashes

```dataview
table tasks.text
flatten file.tasks as tasks
where contains(tasks.text, "Query")
```
aubreyz commented 1 year ago

I have done some more timing tests on this and I think there is an very steep exponential effect of the NUMBER of tasks in the vault and the time taken to render a query

when flatten is involved

When there is no flatten the relationship to data size is fairly linear not exponential.

So for a simple query like

 ```dataview
 task
 WHERE contains(text, "[[Query me]]")
```

10 tasks in database - time = < 1 second 100 tasks in database - time = < 1 second 1000 tasks in database - time = 1 second 2000 tasks in database - time = ~2 seconds 3000 tasks in database - time = ~ 3 seconds

When flatten is used in any query like

```dataview
table tasks.text
flatten file.tasks as tasks
where contains(tasks.text, "Query")
```

10 tasks in database - time = < 1 second 100 tasks in database - time = < 1 second 1000 tasks in database - time = 5 seconds 1500 tasks in database - time = 25 seconds 2000 tasks in database - time = 50 seconds 3000 tasks in database - time = crashes There must be something internally that stops rendering for having to try longer than some duration

Try taking the test database you used @s-blu and progressively increase the number of tasks by copy/paste and see what happens. I tried this on two machines. And it only happens with the newer versions of obsidian/dataview.

aubreyz commented 1 year ago

Hi @s-blu - I wonder whether you managed to reproduce this problem I reproduced in your sample vault when using flatten with a modestly large number of items. I have stopped using flatten in the meanwhile, but in several instances there is no viable alternative for these queries. It seems very clear bug to me.

s-blu commented 1 year ago

Hi @s-blu - I wonder whether you managed to reproduce this problem I reproduced in your sample vault when using flatten with a modestly large number of items. I have stopped using flatten in the meanwhile, but in several instances there is no viable alternative for these queries. It seems very clear bug to me.

I created some more test data on the branch for more reliable and forseeable testing. The thing is: When you use FLATTEN, you create out of page number of results task number of results. That means when you FLATTEN the "bad files", you get 2180 potential matches (instead of 2), 1503 of these matches. Dataview therefor tries to render a list out of 1503 elements, plus the workload created by the string concatination to create a custom output. My Obsidian crashes for your test files, but this is to no suprise, if I'm honest.

I also have no clue about all this caching magic going on, so I unfortunately cannot be of more help than providing some test data. Maybe you want to have another look at the branch - I created a bunch of file-number-task-number tests, if these fail for you. While taking long, all of the tests except your provided example is working fine for me.

aubreyz commented 1 year ago

Thanks @s-blu

OK so there is an issue here, on which we can agree, albeit perhaps not surprising - but I'm not sure. 3000 or so database items is really not that extraordinary - flattening obviously creates a massive magnification effect leading to crashes.

I don't much understand the coding implications - but it seems to me that there are a lot of things that can only be achieved through a flatten - so I wonder whether there might be some coding inefficiency that is making this happen. If one wants to produce a list of tasks with the name of the locating file on the same line some kind of flatten is required - leading to a crash. But if this was inevitable, why is it possible to carry out similar things with a similar outcome easily using the tasks plugin, or even possibly with dataviewjs with say something like:

// get tasks from specific folder
const tasks = dv.pages('"01 Documents"').file.tasks
// add a link to each task
for (let task of tasks){
task.visual = task.text + " " + dv.sectionLink(task.link.path, task.section.subpath, false, "△")
}
// render tasks
dv.taskList(tasks, false)

So what is special about the flatten code that causes crashing? I don't understand Javascript well, but if the javascript can do something similar without crashing why is something similar not coded within dataview?

s-blu commented 1 year ago

You seem to be right here in a matter that I can render a result set of 1500 files within 5 seconds without any trouble (which is quite amazing.) When I flatten these 1500 files (which all contain only 1 todo), I again get 5 seconds. Flattening 750 files a 2 todos takes around 7 seconds. Flattening 5 files a 300 todos takes around 20 seconds and blocks input. So it indeed doesn't seem to be the number of result items but the numbers of tasks per file causing the issue. Worth to note: The 20 seconds are only necessary on my first call. After that, I again take around 5 seconds (by feeling, didnt measured) to render the result, I presume because the cache is ready.

I'll add the new test data to the branch here https://github.com/s-blu/obsidian_dataview_example_vault/tree/dataview-1553, if you want to check @blacksmithgu . Nothing more I can do here, unfortunately.

aubreyz commented 1 year ago

That is an interesting experiment @s-blu which takes the matter much further forward. Given the behavior of dataviewjs and the tasks plugin performing very similar feats with effective flattening (with many items in a few files) I suspect that it must be possible to optimize the flatten code to avoid this roadblock. I'm happy to test things, but the coding is outside of my skill-set :)

I note your comment about working OK a second time (cache we guess), but at n=2000 or 3000 you don't get a second chance because it approaches infinity.... even at 600 or so it becomes massively problematical.