elastic / connectors

Source code for all Elastic connectors, developed by the Search team at Elastic, and home of our Python connector development framework
https://www.elastic.co/guide/en/enterprise-search/master/index.html
Other
71 stars 126 forks source link

[Github] Add support for `mdx` files #2635

Open spong opened 3 months ago

spong commented 3 months ago

Problem Description

Currently the Github connector only supports syncing documents with .markdown, .md, and .rst file extensions as per the docs. I've been working to add support for exposing Search Connector indices as Knowledge Base content to the Security Assistant within Kibana and was hoping to make the Kibana Documentation available for reference, however since we've moved to using the mdx format for our docs it is not possible to ingest and embed these documents using the Github connector.

Proposed Solution

Add support for syncing and parsing mdx files. In conversations, it seems there might be compatibility issues at the Apache Tika layer, so might be more involved than just adding the file extension on within the Github connector.

Alternatives

Right now the workaround is to make a new branch on the repo and do a bulk rename mdx->md, which is not ideal for widespread adoption.

moxarth-elastic commented 2 months ago

@danajuratoni is there anything blocked from our end or we are waiting for this issue https://issues.apache.org/jira/browse/TIKA-4269 to get done?

khushbu-elastic commented 2 months ago

@DianaJourdan Could you please check this & update?