citusdata / pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
https://github.com/citusdata/citus
GNU Lesser General Public License v3.0
1.06k stars 63 forks source link

Use LIMIT to restrict rows from shards #70

Open jberkus opened 9 years ago

jberkus commented 9 years ago

Currently, LIMIT clauses are processed entirely on the head node. These should get pushed down where possible.

For example, with SELECT * FROM members ORDER BY join_date LIMIT 25, we ought to be able to push that ORDER BY and LIMIT down to each shard, return only 25 rows from each shard, then resort and limit them on the head.

jberkus commented 9 years ago

BTW, right now if you try a simple limit query like the above, you get a client termination because it's trying to drag the entire database from all shards into memory:

psocial=# select * from members limit 10;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 101571.258 ms
!> 
!> \q
jasonmp85 commented 9 years ago

This may have some odd interactions with other SELECT features (GROUP BY, HAVING, etc.), so we'll have to carefully consider how queries would be affected by pushing down the LIMIT clause.

I believe the intermediate table used for multi-shard SELECTs should be paging out if the result is too large. @sumedhpathak — is @jberkus' crash expected? I would have thought it would be very slow, but crashing seems off. If it's not expected we'll probably want to track that as a separate bug report.