deadmau5p / kaggle_detect_ai_text

LLM - Detect AI Generated Text. Kaggle competition.
Apache License 2.0
0 stars 0 forks source link

Add aditional data to train roberta #3

Open deadmau5p opened 7 months ago

deadmau5p commented 7 months ago

We should add additional data to train the roberta model. This one looks good: https://www.kaggle.com/datasets/carlmcbrideellis/llm-mistral-7b-instruct-texts. It only contains ai generated essays, which is good as they are less represented in original competition dataset.

deadmau5p commented 7 months ago

More data does not increase accuracy strongly, which is slightly weird.

deadmau5p commented 7 months ago

Here: https://wandb.ai/aljazpotocnik/kaggle-ai-detection/sweeps/k2uv4pxq?workspace=user-aljazpotocnik I am running hyperparameter tunning on Roberta model.