-
Hi Shilin,
It is a great work, and thanks for releasing it to the public.
I was confused about the TPR@0.1%FPR, and from your code, it seems there is only bit accuracy. Could you please indicat…
-
Thank you for the great library!
I noticed that the "evaluation metric" section of readme is expected to arrive soon.
1. Do you have timeline when this will be?
2. Could you share what evaluation…
-
What should we use in HVAC evaluations? @ozanbarism
Like in `scripted_compare_models.py` what should we be asking the LLM and then how to rank results?
https://github.com/bbartling/HvacGPT/blob/d…
-
Hi - The current code does not seem to cover the proposed evaluation portion. Would the authors potentially consider sharing their evaluation pipeline ? More specifically the implementations behind AI…
-
`def beir_evaluation():
actual_contexts_dict = {'0':
{'0': 1, '1': 1, '2': 1, '3': 1, '4': 1, '5': 1, '6': 1, '7': 1, '8': 1, '9': 1}
…
-
Hi, I am working on reproducing the quantitative results (Table 1) from the paper and have a couple of questions:
1. on Validity of the final output
Following the current code flow, it seems tha…
-
I'm replicating your interesting work as a part of my data science thesis, using a different prompt method and larger LLMs (GPT-4o, GPT-4o Mini). I'd love to get some feedback regarding a potential is…
-
https://github.com/huggingface/notebooks/blob/06d842cc40071ef40e0e3acb6e088d59b66c8833/examples/summarization.ipynb#L216
use
rouge = evaluate.load('rouge')
instead
-
Hello,
It would be amazing to see some accuracy metrics for this solution as compared to specific OCR tools.
Have you thought about normalizing and diffing the raw output between this tool and a de…
-
Currently, the evaluate_model function focuses primarily on accuracy and F1-score for classification models, and MSE and R² for regression models. We could enhance this by including additional evaluat…