EveripediaNetwork / issues

Issues repo
9 stars 0 forks source link

Investigate repetition issues with code generation in EVMind models #2915

Open brunneis opened 2 months ago

brunneis commented 2 months ago

Finetuned models enter into a generative loop too often. Investigate and fix the cause if possible: overfitting, misconfiguration of the end_token, etc.

Some context: https://www.reddit.com/r/LocalLLaMA/comments/1ap8mxh/what_causes_llms_to_fall_into_repetitions_while/

danielbrdz commented 2 months ago

Overfitting Definition: The model may have memorized specific patterns from the training set and be repeating them instead of generating novel content.

Answer: It has no relation to the dataset used, it is diverse enough not to suffer from all of the above.

end_token Definition: The model may not be correctly recognizing when text generation should stop.

Answer: That problem has more to do with the execution environment rather than the model itself.

Token Frequency Definition: Tokens with high frequency can cause repetitions due to their higher probability of being selected.

Answer: The dataset contained enough diversity of code to not suffer this evil, and due to its nature of generating various contracts of all types, it is unlikely to suffer this evil.

Temperature and Top-k Sampling Definition: Low temperature settings or too small a value of k in the top-k sampling can result in repetitive generation.

Answer: When the tests were carried out, they were tested with various temperatures, they always gave similar answers, of course with differences but they decided on an intermediate temperature.

Maximum Sequence Length Definition: A misconfigured maximum sequence length can cause the model to repeat sequences to reach the desired length.

Answer: The appropriate length for the size of the LLM (8b) is always chosen to be a length of 1024 tokens.

REAL solution: The colab notebook was changed to one that had more features of a chat, with that change all the repetition problems were solved and normal code could be generated.

All the previous points are not related to the problem of Solidity LLM repetitions, what had to be changed was the execution environment.

Script: https://github.com/EveripediaNetwork/iq-code-evmind/tree/master/Test%20v3.1