danielmiessler / fabric

fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
https://danielmiessler.com/p/fabric-origin-story
MIT License
25.57k stars 2.72k forks source link

[Feature request]: Handle Splitting? #1055

Open rehandaphedar opened 1 month ago

rehandaphedar commented 1 month ago

What do you need?

It would be great if fabric could automatically handle splitting/chunking for text that is too large for a given model.

From what I understand, this would need:

mattjoyce commented 4 weeks ago

I understand the concept of chunking, but how would it work in this environment? I have a large file, and pipe to fabric -p summarize, it splits the file in various ways and summarizes each chunk and joins the results?

I'm sceptical about the efficacy and utility, but interested to hear your thoughts.

rehandaphedar commented 3 weeks ago

it splits the file in various ways and summarizes each chunk and joins the results?

Yes. Not just for summarising though. I was thinking that the patterns could be modified so that they inform the LLM that the input given is part of a larger input, and possibly include the outputs to the previous chunks (In the case of for eg. sliding window approaches).

I'm not sure how difficult it would be to implement the advanced splitting options, for example those that include previous chunks' output as input to the next chunk. However, simple splitting should hopefully be easy to implement, and would be very helpful even with it's inefficiencies.

Regarding combining the outputs, I think even just concatenating them would still be very helpful. I'm not aware of how other programs do it though.

Regarding utility, handling chunking would be extremely useful, as one could for example pdftotxt book.pdf | fabric -p extract_wisdom and similar commands without worrying about token limit.