Open alexanderjophus opened 1 month ago
I think providing our own filter, that we could plugin here would be ideal: https://github.com/Abraxas-365/langchain-rust/blob/main/src/document_loaders/git_commit_loader/git_commit_loader.rs#L45
Also letting the user do their own mapping might be beneficial. I'm working on a PR that does both.
I've updated the PR (it's a work in progress still as I struggle my way through rust's compiler).
The PR/ticket now is to;
Current issue I'm facing is (and many others similar rooting from Repo fields are not Send nor Sync)
`RefCell<Vec<Vec<u8>>>` cannot be shared between threads safely within `gix::Repository`, the trait `Sync` is not implemented for `RefCell<Vec<Vec<u8>>>`, which is required by `gix::revision::walk::Info<'_>: std::marker::Send`
if you want to do aliasing and mutation between multiple threads, use `std::sync::RwLock` instead required for `&gix::Repository` to implement `std::marker::Send`
Is your feature request related to a problem? Please describe. My use case is; a cron job that trains on a git repo. I'm using git commit loader rather than file loader (not entirely sure if this is best for me).
The main thing for me is iteratively adding documents to a vector store, rather than once and done.
Describe the solution you'd like I'd love to filter which commits are loaded, similar to how I can filter for only files of certain extensions. In my specific scenario I'd only like to load commits I've not seen before.
Describe alternatives you've considered A generic filter/lambda type function that allows developers to plug in their own conditions on whether a commit should be loaded.