For example, see the python test split example in python_test0.jsonl:
{"id":6584,"buggy_submission_id":3590,"fixed_submission_id":3591,"problem_id":"p00000","user_id":"u009980501","buggy_code":"for i in range(1, 10):\n for j in range(1, 10):\n print(str(i) +\"\" +str(j) +\"=\" + str(ij))","fixed_code":"for i in range(1, 10):\n for j in range(1, 10):\n print(str(i) +\"x\" +str(j) +\"=\" + str(i*j))","labels":["literal.string.change","call.arguments.change","expression.operation.binary.change","io.output.change"],"change_count":1,"line_hunks":1}
Hi, to match bugs with test cases the problem_id field should be used (as you can see, they don't match).
Please note that I'm currently revising the whole dataset, so the split is likely to change in the future.
For example, see the python test split example in python_test0.jsonl:
the Unit Test in tests_all.jsonl file is :
But obviously the Unit Test can not suit for the python code example. Is there any misunderstanding for the dataset?