elastic / connectors

Source code for all Elastic connectors, developed by the Search team at Elastic, and home of our Python connector development framework
https://www.elastic.co/guide/en/enterprise-search/master/index.html
Other
74 stars 126 forks source link

No documents synced if source table does not have Primary Keys #467

Closed lio-p closed 2 months ago

lio-p commented 1 year ago

Bug Description

Using MySQL Connector, if the source table does not have primary keys, no documents are being synced. No messages explain what happened.

To Reproduce

CREATE TABLE faq_no_pk (
    title TEXT,
    content TEXT
)

Expected behavior

I understand that it's not working probably because Elasticsearch relies on the primary keys to track the synchronization state. But I think it would be interesting to have a message that explains how many documents have been omitted.

Environment

danajuratoni commented 1 year ago

A primary key is mandatory, but missing user feedback seems to be a bug. According to the design we should log an warning/error mentioning which table we're skipping, ideally also a suggestion for a possible fix e.g. consider adding an auto-increment key. The warning / error should be looped back to Kibana. cc: @moxarth-elastic

parth-crest commented 1 year ago

According to the design we should log an warning/error mentioning which table we're skipping,

@danajuratoni We have that logger message where we inform user that we have skipped that table because it has no primary key.

possible fix e.g. consider adding an auto-increment key.

You mean do you want us to add a random unique id to the elastic document when primary key is not attached, in this case all the document will be reindex in each sync. OR You mean it would done at the database level by end user.

danajuratoni commented 1 year ago

Cool, for connector clients, that log message you linked is exactly what we need. For the native connector we'd need a similar warning fed back to Kibana.

sphilipse commented 1 year ago

This probably calls for adding a 'warnings' or 'messages' field to the connector protocol to communicate these kinds of things, maybe with a severity (log/warn/error)? Would definitely be useful in the jobs index, I'm not sure whether that would be as useful in the connector document itself.

Adding a possible 'warning' status to the connector and job would help, too.

parth-crest commented 1 year ago

Agree, 'warning' status needs to be added at the framework level and the connector should be able to pass the messages via any method that appends the message to the 'warning' field.

@tarekziade what are your thoughts on this ?

tarekziade commented 1 year ago

Excellent idea - I would use the standard levels though for the severity field, so info, warning error critical etc.

sphilipse commented 1 year ago

We'd probably get a structure like this:

messages: [
  { severity: 'warning', message: 'Skipped table due to missing primary key'},
  { severity: 'info', message: 'Synced five tables with 120 entries}
]
danajuratoni commented 1 year ago

Marking this issue as blocked by the work needed to support additional severity warnings. Postponed to 8.8.

danajuratoni commented 1 year ago

Moving this to 8.9, as work for additional severity warning levels is postponed.

khushbu-elastic commented 1 year ago

@danajuratoni Is there any update on this considering the 8.9 release?

khushbu-elastic commented 1 year ago

@danajuratoni Please let us know if there is any update on this?

chantzlarge commented 9 months ago

Moving this to 8.9, as work for additional severity warning levels is postponed.

Is the intention to add support for additional indexes or simply improve the warning message? It would be helpful for us if we were able to specify alternative indexes as we are currently unable to index materialize views due to this constraint.

seanstory commented 8 months ago

I agree, it would be good for us to revisit these constraints, and find a way to allow the user to provide a column/function to be used as _id so that views and even arbitrary joins could be indexed. This relates to https://github.com/elastic/connectors/issues/2002