eregs / regulations-core

An engine that supplies the API that allows users to read regulations and their various layers.
Creative Commons Zero v1.0 Universal
16 stars 27 forks source link

Consider consolidating backends #39

Open jmcarp opened 8 years ago

jmcarp commented 8 years ago

We currently support writing data to sql via django, and to elastic. We also support a second elastic / solr index via haystack. I also see that we're talking about an additional search backend in #10. I'm guessing I'm missing some context here, but why is it useful to have all these backend options? Do we have sufficiently different use cases that some users would want postgres full-text search, others postgres + haystack, and others elastic?

cmc333333 commented 8 years ago

For some more context, the application started off only using Elastic Search (no ORM at all); it was also written in Flask :) . Due to technical limitations in our production environment, we had to switch over to Django's ORM + haystack (MySQL + Solr). It was unclear then whether we'd be able to switch back to Elastic Search once the appropriate people deemed it "safe".

Now (2+ years later), I don't think the Elastic Search backend is used by anyone. I think it'd be worthwhile to rip it out and just assume everyone's using the Django ORM + haystack. I know @grapesmoker is working on replacing everything with MongoDB (which, for our needs, is almost identical to Elastic), but he's doing that in a completely different code base.

Re search: we know of two distinct configs (MySQL + Solr and Postgres + Elastic). I think full text search (#10) would be worth investigating as it'd allow us to run without needing a dedicated search index. However, even if it works out, we can't drop support for haystack just yet -- we need it's flexibility. Ideally, we'd have django-haystack/django-haystack#1320 and get support for free.

grapesmoker commented 8 years ago

Just a small clarification: nothing that I'm doing is particularly tied to Mongo, it's just that I picked Mongo because I know it well and it's easy to use. All the same work can be done with Elastic, which I'll probably be working on in the next few weeks.

jmcarp commented 8 years ago

Definitely in favor of tearing out code that we're not using, especially if we don't expect to use it in the future. Does the work that @grapesmoker is doing have bearing on the data models in the 18f fork--should we adopt that once it's ready and hold off on changes here in the meantime? Or start deleting code sooner?

cmc333333 commented 8 years ago

@jmcarp I don't think that'll have immediate impact here (though the use case is worthwhile to consider). We can start deleting now. That said, I don't think this is a big priority. There's lots of eRegs code I'd love to rewrite :p

grapesmoker commented 8 years ago

Nothing that I'm doing would have any impact on 18f's work in the near future.

jmcarp commented 8 years ago

@cmc333333: seems like priority depends on how much is going to happen to the data models in the short term. For example, I'm guessing #36 and #37 would've been quicker to write without having to apply parallel changes to the two backends. I'm inferring from your comment that we're not expecting to change models soon.

@grapesmoker: what I'm asking is whether 18f would want to use your rewrite when it's ready. Do cfpb and 18f have sufficiently different use cases that we should use different backends?

cmc333333 commented 8 years ago

@jmcarp I think it's very likely we'd want to do this when FEC's extra legal docs come in. I defer to @tadhg-ohiggins, @anthonygarvan et al. there. I know they're looking a lot at search now.

grapesmoker commented 8 years ago

@jmcarp: I don't see why not. The use case is the same for both organizations. The idea behind the backend that I'm working on is to do away with a lot of the complexity surrounding things like layers and simplify the logic of rendering the frontend. I've already done a lot of this work, which is based on the XML schema that we've developed.

In principle, none of this is backend-dependent. It seems to me that as far as the backend is concerned, the main desideratum is the ability to search. Whether you're using Django's ORM or some other ORM or just pulling raw JSON from Mongo/Elastic is mostly a matter of convenience. I've spoken to people at CFPB about what they would prefer, and the preference is for Elastic because we have that in production and it's supported (as opposed to Mongo, which would be a new service). But the decision about what backend to use is actually not nearly as important as refactoring the rendering logic, which is the main focus of the work that I'm doing right now.

I don't know how long this work will take, although we do have a working demo that we'll be showing CFPB leadership next week. Of course this work is open source and 18F is more than welcome to adopt it for any purpose. I'd like to see it expand and eventually replace the existing core/site framework.