calvinmetcalf / SQLdown

sql backend for levelup
Other
72 stars 13 forks source link

Very bad performance iterating through a large number of entries #33

Open chrbala opened 6 years ago

chrbala commented 6 years ago

Hello – thank you for working on SQLDown!

I'm trying to run the iterator through a large number of entries (~100,000). It is quite slow – I believe it is loading up all the entries before doing anything because it iterates through them quickly once they start up after several (~6) minutes. When testing on a smaller dataset (~3000 entries), I can read them in about 10 seconds total, with the majority of the time spent before any data is returned.

I believe that iter-stream is the bottleneck. It should either iterate one-at-a-time or batch the reads in reasonably-sized batches.

calvinmetcalf commented 6 years ago

could I get some more details like

More generally we are using the built in cursor functionality of the database but it has to do a query in order to start the stream, so if the query is slow then it will take a while to get the first item because on the database it needs to complete the query before it can return anything, some possible issues

mysql doesn't support indexes on arbitrarily long fields so if you are using mysql then unless you set the keySize option there isn't an index on the keys which could be an issue. I also wrote this a while ago and have learned a lot about database performance since then so this could be something in the query that would be fixable.

chrbala commented 6 years ago

Sure!

platform: OSX Sierra 10.12.6 database: SQLite

It looks like the code branches don't make indexes for SQLite . Also, the readme specifies keyLength and valueLength, but the actual keys are keySize and valueSize.

Is there a reason why all the databases have three columns if (as the readme says) the column limitation is only with postgres?

calvinmetcalf commented 6 years ago

yeah the 3 columns are also useful for dealing with iterators while updates are made, we could use cursors or whatnot but that would be complicated to do for all the different db types, that being said, it does make indexes for SQLite, it makes indexes for every db that isn't mysql