Open semio opened 1 year ago
Here are some files and script you can test:
from lib.pilot.helpers import read_ai_eval_spreadsheet
import glob
import pandas as pd
sheet = read_ai_eval_spreadsheet()
# download above csv files into a folder
session_df = pd.concat([pd.read_csv(x) for x in glob.glob('./*csv')])
session_df = SessionResultsDf.validate(session_df)
sheet.session_results.replace_data(session_df)
Hi @motin I created a cli tool for running the evaluation and cache results locally and upload to google spreadsheet. It's mostly the same process as in the
run_evaluation.py
notebook. But somehow I couldn't upload the session results to google spreadsheet.https://github.com/Gapminder/gapminder-ai/blob/47851d75b34a6ed2b5dc6fdba3344c926eca382c/automation-api/lib/pilot/cli.py#L157
When I enable this line (or
append_data(session_df)
), it result in an errorBut I checked the
session_df
, it has passedSessionResultsDf
validation and the dtypes looks correct.Then I found that it's possibly caused by the size of
session_df
, because if I only upload a few rows fromsession_df
, it worked without issue. 1990 rows worked, but 3980 rows not work.I suggest that, we can check if the DataFrame to be uploaded is too large and give a more meaningful error message instead of
KeyError
.Besides, it seems that 3980 rows is not a very big number.. Is it possible to make it work with larger DataFrames? For our usage, I think we are looking at 200 questions 3 models 10 rounds * 10 prompts = 60000 rows