Josephrp commented 11 months ago

🤔How To

Check our References

trulens github + notebooks : https://github.com/truera/trulens/tree/main/trulens_eval/examples

RAG Evaluation : https://lablab.ai/t/trulens-google-vertex-ai-tutorial-building-rag-applications
Comparing models with TruLens : https://github.com/truera/trulens/blob/main/trulens_eval/examples/expositional/use_cases/model_comparison.ipynb
Llama Index / Chain of Reasoning Eval with Trulens : https://github.com/truera/trulens/blob/main/trulens_eval/examples/expositional/frameworks/llama_index/llama_index_complex_evals.ipynb
Llama Index Retrieval Quality : https://github.com/truera/trulens/blob/main/trulens_eval/examples/expositional/frameworks/llama_index/llama_index_retrievalquality.ipynb

Ideas for Evaluation

RAG
System Prompt
Data Processing Pipeline
Image Inputs

Work

What it takes : literally just running a notebook.

chroma , or embeddings to test
list of prompts to test
test combinations of prompts
multimodal evaluations

we will include the notebooks in the submission and write up

Josephrp commented 11 months ago

hey there @mie-h and @Zochory : https://github.com/Tonic-AI/DataTonic/tree/main/evaluation this is a folder where we will first start working on the trulens evaluations which are a hackathon requirement + good practice while building an app 🫡

Josephrp commented 11 months ago

hey there @mie-h & @Zochory : i added default prompts to the baseline prompts folder we can use those in a trulens evaluation.

Josephrp commented 11 months ago

consider using this to generate "system prompts" for gemini

Josephrp commented 11 months ago

added an incomplete example notebook : https://github.com/Tonic-AI/DataTonic/blob/main/evaluation/results/modelcomparision.ipynb

Josephrp commented 11 months ago

big thank you to 🏆😎 @MN-Noor for producing the first TruLens with gemini on RAG using open ai!

Open tasks :

Make a notebook to test Gemini MultiModal (image inputs)
Make a notebook to test more models against Gemini
Make a notebook to test the "new features of Gemini" like the censorship level.

we'll all work on this together, normally if everyone does one, or at least contributes to a good one we will have secured this task.

Zochory commented 11 months ago

Est-ce que l'on ajouterait pas d'autres multimodal LLM ? comme celui ci dans les evals ? https://huggingface.co/sshh12/Mistral-7B-LoRA-ImageBind-LLAVA

Josephrp / DataTonic

Priority Task : start using trulens to evaluate Gemini #1

🤔How To

Check our References

Ideas for Evaluation

Work