alan-turing-institute / reginald

Reginald repository for REG Hack Week 23
3 stars 0 forks source link

Use llama-hub reader to read data directly from GitHub wikis #77

Closed rwood-97 closed 1 year ago

rwood-97 commented 1 year ago

We currently have pre-processed wikis saved as csv files. These contain:

It would be good load wikis directly from the .md files so that when wikis are updated these can be easier incorporated into our database. This also resolve privacy issues since we cannot have private data in our repo if we want our code to be public.

Might be possible to use GitPython to clone wikis 'in situ' and then llama-index has a Simple Directory Reader which can read markdown files directly.

Things to address are:

rwood-97 commented 1 year ago

These changes are implemented in llama2_wikis branch. See PR #69 and notebooks here and here.