artificialwisdomai / origin

Artificial Wisdomâ„¢ Cloud Platform
Apache License 2.0
3 stars 4 forks source link

Produce a knowledge base index #105

Closed sdake closed 11 months ago

sdake commented 1 year ago

The puprose of this PR is to provide a workload that can be built into a container. Then the container can be run to build the knowledge base.

The knowledge base indexing is modeled using a worklfow. The workflow contains serial steps numbered from 0000..00005. This PR isn't quite complete, specifically the paths are not set properly in all cases. Additionally the volume mounts needed to mount the host filesystem (to access the knowledge base jsonl file) is not completed.

sdake commented 1 year ago

THanks @MostAwesomeDude . I do need to do some additional work, although I needed sleep and wanted to get this in front of you! I will try to get the index server implemented tonight.

sdake commented 11 months ago

Thanks @MostAwesomeDude. There is still much work to do. I guess I will fork that fella repo and start from his work. It seems pretty comprehensive and a great start. I don't want to fork his tree, but I think it will be necessary with the changes I have in mind.

One of the problems with this PR is that while it is ordered, it lacks a ton of context, and could use tighter integration with, for example, arrow, or lmdb. I am still learning about these frameworks, and at this point, I am simply trying to train a retro model with a 120 gig jsonl file. It takes a lot of time to produce embeddings!

sdake commented 11 months ago

@rstarmer I think I should have left this in draft... I will have to do a little better