We currently have pre-processed wikis saved as csv files. These contain:
url,
id,
title,
is_public,
body,
summary,
author,
keywords.
It would be good load wikis directly from the .md files so that when wikis are updated these can be easier incorporated into our database. This also resolve privacy issues since we cannot have private data in our repo if we want our code to be public.
Might be possible to use GitPython to clone wikis 'in situ' and then llama-index has a Simple Directory Reader which can read markdown files directly.
Things to address are:
[x] Use simple directory reader to load wikis
[x] Try using GitPython to clone wiki's 'in situ'
[x] Work out how add wiki urls (and potentially keywords) to metadata of documents
We currently have pre-processed wikis saved as csv files. These contain:
It would be good load wikis directly from the .md files so that when wikis are updated these can be easier incorporated into our database. This also resolve privacy issues since we cannot have private data in our repo if we want our code to be public.
Might be possible to use GitPython to clone wikis 'in situ' and then llama-index has a Simple Directory Reader which can read markdown files directly.
Things to address are: