Closed cpyrgas closed 2 years ago
I did a quick analysis and found the following:
The 1 field key results in an index with 4 levels Branch pages can have 27 keys Leaf pages have an average of 96 keys CDX Levels: Level 1 : 1 page Level 2 : 3 pages Level 3 : 78 pages Level 4 : 2082 pages
The 2 field key results in an index with 5 levels Branch pages can have 17 keys Leaf pages have an average of 32 keys CDX Levels: Level 1 : 1 page Level 2 : 2 pages Level 3 : 22 pages Level 4 : 368 pages Level 5 : 6250 pages
The 5 field key results in an index with 6 levels Branch pages can have 8 keys Leaf pages have an average of 10 keys CDX Levels: Level 1 : 1 page Level 2 : 5 pages Level 3 : 40 pages Level 4 : 313 pages Level 5 : 2500 pages Level 6 : 20000 pages
The 10 field key results in an index with 9 levels Branch pages can have 4 keys Leaf pages have an average of 5 keys CDX Levels: Level 1 : 1 page Level 2 : 3 pages Level 3 : 10 pages Level 4 : 40 pages Level 5 : 157 pages Level 6 : 625 pages Level 7 : 2500 pages Level 8 : 10000 pages Level 9 : 40000 pages
Of course this is also true for the VO and Vulcan drivers. I'll see what I can do to improve the speed.
You may want to check where the speed is lost:
What I have seen myself is that the # of pages that are written in the last phase is much larger for indexes wit mulitple keys because there are a lots of 'embedded' spaces in the expression that are not removed with the compression algorothms. This causes the number of leaf and branch pages to grow a lot.
The problem should be fixed with my last commit. The time with 10 fields is now < 10 sec. The problem was that CheckVersions() was called in write() and did a loop across all pages in the list to make a Debug.Assert(), so it introduced an O(n^2) delay when writing all pages.
To fix it I put the version checks under a conditional compilation flag, so they can be enabled by defining CHECKVERSIONS
(I think the delay is excessive to even keep them in a debug build).
Please confirm that the issue is solved.
Nikos I added that check when I was fighting with the corruption. It can be removed now. I am moving Rolf today, so I’ll look at this tonight or tomorrow.
Yeap, I am seeing a very big increase in speed! Not only on this test, but also some other ones.
Nikos, This looks great.
There is another thing, that you don't see in this example: some of our customers noticed that FoxPro is a LOT faster in shared mode. I profiled that and noticed that FoxPro 'reads ahead'. So when it reads a record of for example 100 bytes it does not read 100 bytes but for example 200 or 400 bytes, so the next records are in memory already. It also does that with index pages. When you read a page then it inspects the pointer to the sibling pages and reads the next page as well (most likely in a background thread). Of course there is the risk that these records or pages will be updated by others, so at the moment when you try to write (so when you lock the record) then FoxPro re-reads the record and verifies that it has not been changed. This read-ahead cache has a limited life time (I think it defaults to 5 seconds). You can set that value to -1 (never buffer), 0 (do not refresh the buffer) and any other values between .001 and 3,600 seconds. Is this something that you can build in too ?
Robert, flushing is the main bottleneck. If it's not removed everything else is peanuts. To prove that point I have pushed a commit that disables file buffer flushing on DbCommit(). Feel free to try it out, but I have to warn you that we must provide a way to enable it to the users, so it should not stay as is!
Note that flushing was disabled only for DBF and DBFCDX. If you try something else you will see little or no benefit!
Also, can we have a relative performance measurement of our current engine, Vulcan and FoxPro? It is good to have a goal ;)
Fixed. The speed degradation for multi key fields is now what you would expect
https://www.xsharp.eu/forum/german/2575-ist-der-xbase-datenbanktreiber-unter-xsharp-core-bereits-fertiggestellt#18927
For a small index expression, index creation speed in X# and vulcan is close enough. But as you start increasing the number of fields in the index expression (increase INDEX_LENGTH to 2,3,5,10), the performance impact in the vulcan RDD is very small (doubles only when index length gets x10), while in X# the speed drop is very large, starts by doubling with using a 2nd field, and quickly turns into a geometric progression of time spent.
Increasing the FIELD_LENGTH instead does not have a big impact, in either X# or vulcan.