Apicurio / apicurio-registry

An API/Schema registry - stores APIs and Schemas.
https://www.apicur.io/registry/
Apache License 2.0
606 stars 269 forks source link

Content based search #2719

Open ozarkblue opened 2 years ago

ozarkblue commented 2 years ago

Content based search will make the adoption of this Registry tool faster within large Org having thousands of schemas. Often teams do search first to understand what existing events having certain attributes/fields before asking to create another events. Without content search this is very difficult.

Yes an indexing mechanism is required to make this happen. Options I've seen in other tools : Batch indexing (cron expression can be configured by admin to run the batch to avoid making tool response slow all the time due to indexing). Ideal will be real time index update but not making compromise to other response(UI & API).

Even though some schema formats are difficult to add in search but starting with just Jsonschma and other few schemas (even latest version content search only) in search index (text bases exact match to start with) will be huge uplift to usability of this Schema registry.

apicurio-bot[bot] commented 2 years ago

Thank you for reporting an issue!

Pinging @jsenko to respond or triage.

EricWittmann commented 2 years ago

Is this not a duplicate of https://github.com/Apicurio/apicurio-registry/issues/1260 ?

ozarkblue commented 1 year ago

@EricWittmann any movement on this content search feature or any other possible solution that you might have come cross for this ask?

EricWittmann commented 1 year ago

This topic came up again a few weeks ago. I'm a little concerned about scope creep - I'm not convinced that content based search is an appropriate feature for the registry to have. I can see the argument for it, don't get me wrong. But the registry was not designed to be a catalog - it was designed to be a runtime registry. Fast, efficient access to versioned content - typically accessed by some form of unique ID. The UI makes it look a little like a catalog if you squint and don't look directly at it. :)

Content based searching really moves the needle in the direction of catalog functionality, in my opinion. Perhaps that's what we want, but I'm not convinced.

That said, to your point about other possible solutions. A couple of things come to mind. Orthogonal tooling could be created using the Export functionality. Basically a indexing/publishing approach, where there is a cron job that exports the entire contents of the registry, indexes them, and publishes the index somewhere. Typically something like Solr or Elasticsearch I guess.

Another approach would be to use the Event Sourcing features of registry to implement a more real-time approach to indexing. Perhaps use something like Kafka to publish an event whenever an artifact is created/updated/deleted - then have a Kafka consumer update an external index. This doesn't allow for easy integration into the UI, of course - but perhaps that is something that could be implemented in Registry (a content search API that could be provided externally, and a special search syntax to utilize it?).