Closed johnr14 closed 3 weeks ago
While YAML "scripting" the pipeline may still be a goal, I think I figured out how to do it. Will open issue again on this if needed or submit code for pull request if I get it to work.
Also, using JSON is so much better, I can have a JSON template as well as JSON with values for fail or pass, and support multiple language by changing the JSON file or translating it on the fly.
Hi, I was going to write some app from scratch to do what you already did, so I am trying out your great app. EDIT: Sorry for the wall of text, it's mostly some ideas and where I am going with your app, can be helpful for you or others.
I find it cumbersome that code in steps.py need to be modified for a specific use-case... Also, I've seen many use JSON output for structured output that could simplify the prompting, save tokens, and insert directives in the .yaml on how to process the resulting JSON without having to touch the python code. Not sure if JSON is widely supported by llm...
I was thinking that having a single processing function that will take the prompts and the validation from a config file would be much better.
So for the yaml file :
Include in the yaml file the REGEX to process the answer or a other prompt to validate ? like :
That way, prompts could be all processed with a single pipeline and :
This kind of pipeline could be possible :
I was looking at your prompts, and they are quite long. I have ollama that stops responding once in a while and must kill
run_augmentoolkit.py
to restart the run. -> EDIT: Seem better after setting ctx from 8k to 16k.I have had success with short prompts and that could save lots of tokens. Maybe some miss, but if we get a high rate of rejection, could change prompts for question generation that are more advanced/cost more. Also, I am thinking of specifying the knowledge domain of the questions to be from a list, so I don't get questions like
Which French book focuses on rethinking the concept of
that are just a waste of tokens.Sidetrack notes Also thinking of some preprocessing post-processing using a MICRO llm like 2-3b for a few checks, like validating that it's knowledge I want to extract. A LARGE llm could also be used to generate a summary about a specific idea of the text that would be trained using a question like
Give me a general explication about ...
andWhat does "this concept" relate to and how could you explain it to me in a {simple|expert} way?
Then mix some(most|random?) related concepts together and have a 405b try to make sens of it in a big and coherent way to explain expert knowledge for non expert to learn it or ask questions about what he doesn't understand ... like good teaching material. It would require some way to accumulate related knowledge, add a field to dataset ? (not there yet). End sidetrack notesAny way, what got me here is that I got a false reject using dolphin 8b:
EDIT: Found that it's in
def parse_answer_relevancy_validation_step(thought_process):
I am processing some data,grep
theExplanation of Judgment
and looking for keywords to add likeWill change from dolphin to hermes 8b. EDIT: hermes does a really better job and appends
Relevant
orIrrelevant
like 90%+ of the time while dolphin was like 25% !I think that instead of looking for all sort of keywords, a MICRO LLM could parse
Relevance Assessment
andExplanation of Judgmen
to return a json with a "Assessment": "True" or "Assessment": "False" to prevent bad identification. That would be cheap as few tokens and low price on such small LLM, could be local with 4Gb NVRAM...I was looking to fix this, then I started thinking where would be the best place hence, having it direcly in the YAML seems the best way.
EDIT: Faults related to the LLM. Curently using hermes/dolphin/mistral/qwen in different quants to play around with a few pdf. Having the
assessment
validated by a micro llm seems a good approch that could mitigate llms that don't follow the directive to appendRelevant
orIrrelevant
.So while thinking about it, some brainstorming :
SOME MAJOR SIDETRACK prompts could be ordered by a number before them 01_check... 02_get...
prompts could be sequential, meaning it requires a valid certain valid prompt before. so prompts should be able to set variables...
and an other prompt could :
Some quick examples that would need more work:
This kind of thing would make it more flexible. Should YAML be used that way, not sure if it's the best solution.
I hope this helps.
P.S. My idea is to get similar knowledge domain data, concepts across multiple publication and gather them (using vector RAG?) to try to generate coherent and valid deeper knowledge on a subject from publications/notes/books that may not be available online... That knowledge must be explainable, hence it must be able to cite source material, author...
Ok, back to debugging to fix my
Answer relevancy validation failed!
tests.