ducoquelicot / observ_admin

All admin and back-end for Observ
0 stars 0 forks source link

Indexing file content in ElasticSearch #11

Closed ducoquelicot closed 5 years ago

ducoquelicot commented 5 years ago

Issue The search portion works, with mock data. But what I haven't created yet is a way to index the .txt files (or .pdfs) that are the public records that I want to be able to search.

Thoughts

Ideas

I'm guessing the above error is that there's a character in the document that throws the string formatting out of whack, but I can't check all the files manually for those characters so either I have to write a script that ensure the characters are processed in the right way, or I have to figure out something else.

Let me know!

zstumgoren commented 5 years ago

Hey, First, HAPPY BIRTHDAY!

And some thoughts on the questions:

ducoquelicot commented 5 years ago

Thanks!

I think the problem is not so much in indexing in ES (I figured that out) but writing a script that will automatically index all the necessary information into the SQLite database. That part isn't covered by the Flask tutorial because he only indexes posts, which are written on the website, so that goes through a form of sorts - and we talked about that last week.

So far so good though, I was able to update the txt files into ES. So indexing into the SQLite database is the next step.

zstumgoren commented 5 years ago

Aha, ok, I get it now. So the secret ingredient is the SQLAlchemy ORM layer. Once you've integrated the CRUD operations for Elastic into the database model(s), you can use those models in any context. In a Flask web request context, they'd likely be used in one of your routes (perhaps when a user uploads a document manually, if you decide to support that feature). In a similar fashion, you can also write a command-line script that imports those same models and inserts documents. The code logic will likely be highly similar if not identical, so you'll be setting the stage for flask app integration by implementing the command-line client.