Revamp how we filter - Githubissues

qasim commented 8 years ago

The filter code is probably some of the oldest living code in the entire project; it was one of the first things Ivan and I worked on back in 2014. It worked well with one or two endpoints, but we're almost at 10 now and it's about time to revisit this.

I'm going to start working on a query tokenizer module along with a token parser, laying out the groundwork for all future APIs to follow (and we will slowly move old filter code to the new one).

Here's what I'm thinking so far:

Query tokenizer
- Takes the raw user query and splits it into pieces (i.e. date:>"2016-04-28", code:-"CSC108")
- The return value will be a multi-dimensional array, it splits first on AND and then splits second on each of those with OR (since AND takes precedence)
Token parser
- Takes a piece from what the tokenizer outputs and converts it into an object that has data on what the token is trying to accomplish
- The return value of the token parser will give us insights on whether errors occured during parsing and whether a further mapreduce step is required for sub-documents or not.
- It will also give the operator from the original token, and the raw MongoDB query that is needed to fulfill the filter itself
- Should support the following
- Numbers, arrays of numbers
- Strings, arrays of strings
- Dates in the format of YYYY-MM-DD
- Times in the format of HH:MM or simply just seconds until midnight

qasim commented 8 years ago

The goal is to house all the things we do in each filter file and abstract it. It will be similar to how an actual language interprets syntax.

qasim commented 8 years ago

Happening at cobalt/filter-revamp.

qasim commented 8 years ago

cobalt/filter-revamp/src/api/utils/query-parser/index.js

qasim commented 8 years ago

Here's what I've got as for a preliminary filter endpoint function under the new QueryParser model: https://github.com/cobalt-uoft/cobalt/blob/filter-revamp/src/api/buildings/routes/filter.js

On average, filter requests in the new model fair slightly faster than the current stable release (0.4.3), tested using Nodejs 6.0 for both (~100ms difference testing 100 sequential requests, averaged over 10 attempts). Not so significant, but it's good to know its not slower.

I still haven't addressed things that require MapReduce. I'm looking into using MongoDB's new aggregate functions and whether they are speedier. Will report back as soon as I get something conclusive.

qasim commented 8 years ago

QueryParser https://github.com/cobalt-uoft/cobalt/blob/filter-revamp/src/api/utils/query-parser/index.js

courses/filter https://github.com/cobalt-uoft/cobalt/blob/filter-revamp/src/api/courses/routes/filter.js https://github.com/cobalt-uoft/cobalt/blob/filter-revamp/src/api/courses/routes/filterMapReduce.js

@kshvmdn this is what query parsing + a mapreduce looks like under new model. What do you think? I'm exhausted from looking at this so please help me dig around and see if we can simplify this at all ._.

kashav commented 8 years ago

Only got to take a brief look at this (will test in depth when I get the chance), it looks good so far.

I'm assuming date/time parsing is not complete yet (unless the plan is to ignore invalid input, in which case, this value needs to be returned).

I think mapreduce looks fine, we might be able to wrap filter comparisons into a function so we don't have to repeat code for arrays and non-arrays. Other than that, I don't think there's much else we can do.

If I think of anything, I'll let you know

qasim commented 8 years ago

@kshvmdn some good news on the date parsing, I found out that we can do the number operations on strings in the case of dates and MongoDB will handle it for us as long as both the comparators are strings (which in this case they are). That means I retired date_num and we don't have to deal with that mess in tests anymore too.

qasim commented 8 years ago

I've also added throwing appropriate errors, better to tell the user something went wrong I'd think.

cobalt-uoft / cobalt

Revamp how we filter #69