Closed abendebury closed 10 years ago
How do you feel about separate databases for each collection? It's the simplest and easiest solution.
Actually there's still issues with creating a database on the fly, users, etc. but it's a viable option.
I don't know, that sounds like a lot of trouble... didn't you mention something about limiting queries to a project?
Yeah so I'm thinking something like this: we add an additional project_id
to things like word_in_sentence
, dependency_in_sentence
, sequence_in_sentence
, etc., then either use alternate joins or just write methods to scope the queries using an active_project
variable that is accessible across requests (probably in the Project
model) that stores the active project ID.
active_project
doesn't seem like a good idea if we end up processing multiple projects concurrently. I think the alternate joins are a much better solution.
Well the alternate joins still need an active_project
variable to scope with. What do you mean by processing multiple projects simultaneously? Like doing comparisons across projects? Because that could turn ugly very quickly.
No, I mean if we're preprocessing two projects at once, wouldn't it be an issue that there are two "active" projects?
I'm not exactly sure how we'd even preprocess two projects at once. I haven't implemented any kind of threading to do anything like that.
I did, from the front end.
Oh I see. Well, as long as we note it somewhere, it shouldn't be a problem because we don't make any queries that concern this issue in the preprocessor. I just need to change the sentence.add_[stuff]
methods to also take in a project, and pass around the project ID wherever necessary.
What do you mean when you say that we don't make queries that concern this in the preprocessor? Aren't we discussing the queries in the preprocessor?
As far as I know, we never use word.sentence
, word.sequences
and the like because we mostly only do writes in the preprocessor. It's more of a concern in the main application.
Also, alternate joins aren't working out too well, so I might just write them as methods if I can't get it to work soon.
Yeah, that sounds fine.
Fixed in f4d7de29aa2a1b1dc7881c411b03a4764487daf0, but unit tests are failing because StringProcessor
and SequenceProcessor
now require a project
in their constructors. I tried and failed to update the unit tests to properly register the change, so @PlasmaSheep if you could it'd be appreciated.
On a side note, since the relevant calls (word.sentences
, word.sequences
) are no longer relationships, we can't treat them like lists anymore; I've updated testmodels
to reflect this change (in any case, because the association objects have additional fields, we should never have been using the object.relationship = [items]
and object.relationship.append(item)
syntax anyway).
Fixed everything except for the CollectionProcessor error. You removed the line that called counter
, but presumably we still need to do that so I'll leave the unit test failing as a reminder.
I changed the way counts are computed; sentence counts are now done on the fly (while Aditi did say that it was faster for her to count afterwards, because of our implementations it's actually much faster to count on the fly). I've yet to figure out how to do document counts more quickly, though.
Basically, every query should be limited to the correct project.