Closed mst272 closed 1 month ago
The score of testing with the base model on the cross code eval dataset is very low.
The score of testing with the base model on the cross code eval dataset is very low.