elastic / connectors

Official Elastic connectors for third-party data sources
https://www.elastic.co/guide/en/elasticsearch/reference/master/es-connectors.html
Other
18 stars 136 forks source link

[Github] Add support for `mdx` files #2635

Open spong opened 5 months ago

spong commented 5 months ago

Problem Description

Currently the Github connector only supports syncing documents with .markdown, .md, and .rst file extensions as per the docs. I've been working to add support for exposing Search Connector indices as Knowledge Base content to the Security Assistant within Kibana and was hoping to make the Kibana Documentation available for reference, however since we've moved to using the mdx format for our docs it is not possible to ingest and embed these documents using the Github connector.

Proposed Solution

Add support for syncing and parsing mdx files. In conversations, it seems there might be compatibility issues at the Apache Tika layer, so might be more involved than just adding the file extension on within the Github connector.

Alternatives

Right now the workaround is to make a new branch on the repo and do a bulk rename mdx->md, which is not ideal for widespread adoption.

moxarth-rathod commented 4 months ago

@danajuratoni is there anything blocked from our end or we are waiting for this issue https://issues.apache.org/jira/browse/TIKA-4269 to get done?

khushbu-elastic commented 4 months ago

@DianaJourdan Could you please check this & update?