datajoint / datajoint-matlab

Relational data pipelines for the science lab
MIT License
42 stars 38 forks source link

Support 'ORDER BY' #7

Closed peabody124 closed 11 years ago

peabody124 commented 11 years ago

It would be nice to be able to order by a field and return a limited number. This could accelerate some things over a slower network connection.

aecker commented 11 years ago

Yeah, I find myself implementing the following pattern quite often, which could be much easier if ORDER BY was used:

result = fetch(rel, '*');
result = dj.struct.order(result, 'foo');
a = [result.a];
b = [result.b];
c = [result.c];

instead of, for example, a simpler [a, b, c] = fetchsorted(rel, 'foo', 'a', 'b', 'c').

Sorting should be limited to happen at fetch time. Anything else would probably break (or at least complicate a lot) the relational model.

dimitri-yatsenko commented 11 years ago

We could use the fact that all attributes in DataJoint are lower case. Upper case strings will be interpreted as parameter names. So Alex's code above will look like:

[a,b,c] = rel.fetchn(rel, 'a', 'b', 'c', 'SORT BY', {'foo'})

Or we could use a structure to pass parameters into fetch:

opt.sortby = {'foo'}
[a,b,c] = rel.fetchn(rel, 'a', 'b', 'c', opt)

Preferences? I think the second way looks more graceful.

dimitri-yatsenko commented 11 years ago

okay, this will let us to implement limits more gracefully too:

clear opt
opt.sortby = {'foo'};
opt.start = 1000;
opt.limit = 100;
[a,b,c] = rel.fetchn('a','b','c', opt)
aecker commented 11 years ago

I prefer you last suggestion. If sortby could also be a string in case of a single attribute that would be even better :-) And please let's keep the old way of doing limits for a while before throwing it out.

peabody124 commented 11 years ago

Wouldn't the first suggestion be better? Aas the string we specify is appened at the end were passed we could easily write our own:

[a,b,c] = rel.fetchn(rel, 'a', 'b', 'c', 'SORT BY "a" LIMIT 500')
dimitri-yatsenko commented 11 years ago

All fetch operators now handle the last argument differently if it begins with "ORDER BY" or "LIMIT ". In both cases, it is simply appended to the SELECT statement.

Examples:

[a,b,c] = rel.fetchn('a','b','c', 'ORDER BY a*b DESC LIMIT 3 OFFSET 5')
[a,b,c] = rel.fetchn('a','b','c', 'LIMIT 3 OFFSET 5')

I will update the documentation after a bit of testing.

aecker commented 11 years ago

The old fetch*(..., 1) syntax for limits is broken now...

peabody124 commented 11 years ago

I wasn't familiar with that syntax. That just limits the length by one without ordering? That seems like it's essentially "return a random key" given we aren't meant to make assumptions about ordering.

aecker commented 11 years ago

Yes, but often a random key is just fine (e.g. if you know that a bunch of rows all have the same value) and I'm using it at several places and don't feel like refactoring all my code again. That's why I specifically asked to not break that syntax in the comment above :(

dimitri-yatsenko commented 11 years ago

I just enabled the old syntax for backward compatibility. I only did it for limits, not offsets. So fetch(..., 10, 10) will not work, but fetch (...., 10) will work.

I would argue that it makes more sense to go back and replace all instances of fetch(....., 1) with fetch(....., 'LIMIT 1') for better readability anyway.

Relevant http://xkcd.com/1172/