Open kadhonn opened 3 months ago
I think there is multiple things to split up here:
And then there is variants like:
description:en description:de
which might be nested like description:markdown:en
my suggestion (not sure if its a good one) would be to extend the EntryDTO to support the datatypes as follows
existing one
typealias Entry = Map<String, String>
data class ComplexEntry(
val keyValues: Map<String, String>,
val lists: Map<String, String>,
val blobs: Map<String, String>,
val dates: Map<String, String>,
val times: Map<String, String>
)
That way the search service could know the data type, but without having to specify which field has which datatype.
Downside would be that eventcollectors would have to put it in the correct datatype and support everything correctly. But we could provide a nice API/Utilities to hide that from eventcollector programmers..
My suggestion for variant handling would be that search service just drops everything after the first colon in the key and when doing operations just applies them on all variants of the keys.
That way e.g. if you search for description contains "xyz" all languages of the description would be searched automatically.
In the last few weeks we had multiple discussions about different features that need more structure in our simple string->string map, mainly discussions about lists, i18n features and embedding more structure like json for combining lists and objects or similar stuff. But we never got a real satisfactory answer to how to do it in a good way, so to start this discussion here are my thoughts and reasoning until now..
Structured Data
Motivation
So, why/where do we need to care about the different data formats/structures?
Our Architecture is structured in three parts: Collectors, Core, Publishers Collectors and Publishers will mostly be specialized and to some degree have to agree on the same property names/format anyway to work together, so this is not the primary concern here. But our Core, especially the Search Service is conceptually as generic as possible, meaning it should work with pretty much all property names/formats. And for things like searching, sorting, faceting, ... to work correctly, we need to know how to handle the different data structures.
Some features we have do need more structure to be supported fully, I have identified some for now:
concert.bandlist
propertystartDate
andendDate
for now, but there may be others as well. This is important for sortingPossible solutions
I can think of three ways to convey structure:
name
and then having a meta-property named?name
orname$schema
which contains the structure of the data, maybe a simpledate
orlist
. Or another example would be what Open Street Map is doing with its i18n support by adding some metadata to the property name itself, likename:cz
andname:en
for different locales. You could do this for lists as well. We of course could embed this metadata in the value of the property itself, but that makes everything a bit more complicated to parse, especially because you would need special handling for simple text as well.Soo, that is how far I am with this topic, what are your thoughts?