Closed manthey closed 9 years ago
Nice. What types of queries are currently possible on elasticsearch data?
Quite a bit. Elasticsearch has different terminology from other databases, and divides what would be in SQL where clause into 'filters' and 'queries'. We can search based on ranges (dates, distances, etc.), and on equality. We can get a consistent random sample (it is supported natively in ES). The performance seems very good in the Baltimore dataset.
The one issue I don't know how to resolve easily is full text search. You can turn it on and then any new rows ('documents' in ES terms) are searchable, but existing rows are not. You have to create a new schema ('reindex' in ES terms) with the indices ('mapping') already specified and copy over all of the tables ('types') from the existing schema.
I don't know what the backing hardware the ES instance that I am accessing has, but it is definitely a cluster of several machines. The Baltimore data set is actually a subset of the data within the schema I was accessing. Each 'document' is assigned a type. I pulled all of one schema, rather than a specific table.
I think it is possible to make the queries I need for real-time update with the existing mapping, but I haven't worked out how to do it.
In further investigation, I don't see any methods for getting data ingested since a particular time or _id . It might be possible if _timestamp mapping is turned on (with store = true) for an index and _type. Otherwise, I'll have to fake it by doing work on the server end of things (or loading the client excessively).
More work will be need to (a) support 'realtime' data feeds, (b) handle anything other than Instagram in a particular format, (c) properly deal with text searches unless we can setup the Elasticsearch mapping.
Update to the latest version of Girder.
Fix a bug in the 'realtime' postgres data access where data that is added to the system between fetching the first part of the initial set of rows and the subsequent part of the initial set of rows might not be retrieved.