KLUE-benchmark / KLUE

📖 Korean NLU Benchmark
https://klue-benchmark.com
Creative Commons Attribution Share Alike 4.0 International
565 stars 57 forks source link

klue-mrc-v1 answer_start 오류 #13

Closed scy6500 closed 3 years ago

scy6500 commented 3 years ago

안녕하세요! 좋은 데이터셋 공개해주셔서 감사합니다

klue-mrc-v1 사용 중에 answer_start가 맞지 않은 오류가 발생하여 이슈를 남기게 되었습니다. guid의 맨 뒤 숫자 기준입니다.

train ['03417', '06461', '03791', '02785', '13386', '02939', '08762', '01155', '04281', '05937', '02360', '01320', '05847', '01612', '03329', '03730', '04382', '13354', '03679', '05233', '01142', '03384', '04241', '04836', '01338', '03751', '02511', '04162', '00577', '04115', '01680', '04201', '05189', '05661', '01080', '03761', '01935', '01589', '00105', '00738', '02783', '02464', '01126', '04980', '00054', '04988', '03994', '02184', '03102', '04946', '01193', '03765', '01560', '05294', '02922', '02579', '04121', '04056', '03798', '00020', '03432', '03712', '05504', '01833', '03875', '01063', '02228', '05086', '00398', '00132', '03800', '03396', '01829', '00409', '08497', '04597', '01065', '02117', '03732', '02190', '04767', '00267', '01912', '03627', '05194', '03101', '04700', '03790', '04211', '11689', '02549', '03771', '02277', '03229', '03452', '02626', '00259', '13519', '00634', '03546', '02645', '04519', '02892', '02275', '04446', '01226', '01054', '04882', '03366', '04055', '04562', '02441', '03606', '02694', '00419', '01632', '00998', '08216', '04364', '05007', '01461', '00728', '03160', '01597', '02010', '12215', '02078', '01585', '02394', '04887', '05174', '02029', '00290', '02362', '02545', '00307', '16285', '00071', '01177', '03963', '04253', '01210', '00404', '02996', '04895', '01645', '05026', '05268', '00994', '15640', '04286', '03655', '02499', '05133', '05113', '02408', '01584', '05201', '05395', '00719', '00871', '13062', '04620', '06503', '03192', '11297', '03723', '04012', '01033', '02116', '00103', '04259', '03936', '01615', '00128', '01500', '02232', '00777', '00285', '00918', '01217', '05425', '13443', '04902', '00055', '02824', '04250', '05480', '00511', '02961', '00924', '04358', '05095', '07080', '04528', '00219', '00941', '00550', '10501', '15065', '00976', '05538', '04711', '04365', '05372', '00508', '00389', '01310', '03877', '05053', '03424', '02556', '04141', '04691', '04340', '16187', '01395', '04198', '02843', '02132', '03078', '02928', '02891', '09510', '01205', '00338', '02304', '00698', '01269', '14296', '04367', '00415', '05240', '01616', '02474', '01043', '05214', '05460', '02940', '05471', '03019', '02133', '05207', '05853', '00091', '04974', '00702', '03326', '16038', '05508', '04857', '03240', '05305', '02756', '03440', '04805', '02162', '03857', '00733', '05100', '03678', '03897', '03253', '02266', '01392', '00092', '01720', '00798', '16342', '04944', '15074', '04680', '10894', '00747', '03868', '01376', '04985', '04406', '00906', '03336', '01263', '03972', '03047', '05888', '03012', '02442', '04706', '01175', '02218', '04525', '00821', '00925', '04077', '01777', '04247', '02014', '04101', '02368', '04786', '00883', '02661', '01117', '01365', '16693', '03339', '04052', '04596', '03419', '05254', '03982', '00731', '03060', '00647', '04449', '01364', '03310', '00376', '05191', '02502', '03696', '01143', '01271', '04100', '03967', '01464', '03256', '04491', '07206', '00957', '04376', '07412', '03825', '04242', '02409', '02397', '03116', '03658', '15662', '01603', '00900', '05384', '08537', '04421', '07200', '04047', '05271', '02066', '04076', '01708', '03921', '04166', '02244', '03086', '01690', '00985', '02510', '04366', '03265', '05011', '03938', '01523', '02030', '00354', '01941', '03111', '07407', '02207', '02547', '02951', '00041', '04801', '03748', '11858', '17077', '05245', '04859', '04130', '02744', '03358', '02351', '04802', '01060', '03044', '07328', '05216', '04163', '01918', '01195', '01913', '00908', '02427', '04993', '05303', '04337', '00662', '00628', '04890', '02258', '05511', '04390', '02899', '03489', '03213', '04682', '02255', '05172', '04605', '00429', '01551', '05595', '02950', '00595', '01671', '01268', '00767', '01505', '00940', '03045', '05016', '01953', '03279', '00889', '00836', '05370', '03543', '07889', '02994', '01307', '01525', '01406', '00321', '04112', '00725', '14188', '02110', '07117', '04792', '04703', '03189', '02429', '04822', '02660', '06731', '07636', '02845', '10129', '02805', '04716', '04403', '05227', '02820', '03346', '02417', '03361', '04428', '11034', '04507', '05432', '05466', '11469', '02209', '00221', '04304', '15688', '03272', '04737', '04843', '04409', '04138', '04370', '00829', '04684', '02776', '00007', '00982', '05136', '03234', '14263', '01325', '03789', '05124', '01405', '00397', '16485', '16705', '00581', '04104', '01992', '03184', '01982', '01357', '00935', '04557', '02810', '04481', '00466', '03652', '01812', '13935', '03916', '00256', '01145', '03813', '04267', '04458', '00311', '05339', '05554', '03954', '04455', '01257', '06992', '04288', '00995', '08421', '01467', '05474', '00872', '02690', '05131', '05434', '01570', '00633', '02736', '05399', '04606', '04235', '02698', '05120', '04173', '00167', '03842', '02344', '00474', '01923', '03952', '04441', '03681', '03976', '03804', '02300', '01827', '01132', '05027', '02346', '00284', '04423', '00017', '01097', '16060', '00190', '01737', '02550', '02874', '01137', '00407', '00803', '04593', '04978', '04393', '02715', '00095', '02355', '11173', '04926', '02425', '03892', '04846', '05015', '00123', '04418', '04891', '05291', '04602', '05235', '05197', '03919', '05044', '03847', '07518', '05058', '01246', '04505', '02015', '11246', '05111', '03220', '01396', '04821', '04206', '02878', '04598', '02329', '04320', '15169', '03525', '00482', '02710', '00341', '00930', '00727', '03476', '01517', '03369', '01621', '03138', '03377', '01653', '03191', '03433', '01583', '01693', '02112', '09865', '02367', '03196', '01481', '04758', '00309', '03263', '03536', '01417', '05396', '02506', '15829', '01638', '00265', '01331', '01572', '01991', '04780', '02091', '03478', '08068', '03164', '03909', '01633', '02385', '04321', '02230', '03113', '05448', '02366', '02750', '05421', '00286', '01356', '02884', '02890', '00420', '01020', '04777', '03014', '04381', '01424', '04134', '02944', '06621', '03742', '00969', '08061', '04698', '01729', '03302', '00575', '02687', '02356', '00952', '02051', '01422', '01562', '00044', '00584', '00967', '14787', '03686', '00025', '03002', '00685', '03080', '00778', '00287', '02701', '09756', '02306', '01637', '03139', '04963', '00371', '04011', '01581', '04645', '04701', '04361', '05238', '01919', '04663', '04360', '15820', '03722', '00400', '02145', '02121', '03724', '02420', '02214', '12287', '10282', '16894', '02390', '05528', '05069', '02077', '12507', '03930', '01449', '03199', '04532', '03537', '03907', '03261', '01906', '00757', '04922', '04332', '00999', '00297', '00933', '00226', '00553', '03797', '01648', '02069', '00825', '01916', '00604', '04252', '05487', '00560', '02114', '02678', '02323', '01014', '17053', '03112', '00649', '02449', '01666', '01950', '00015', '04378', '02452', '01101', '00893', '04883', '01000', '05036', '00739', '00173', '00408', '03563', '00753', '06189', '04522', '00018', '04045', '03778', '04639', '03435', '02702', '01739', '16110', '05893', '03739', '04384', '01576', '02733', '01359', '01786', '00869', '01106', '03531', '04727', '02725', '04622', '03292', '02673', '04145', '17291', '00811', '00632', '00827', '03737', '03701', '05078', '15800', '05034', '02973', '01323', '01478', '01416', '05057', '00742', '05795', '00214', '09374', '00932', '05104', '02327', '00098', '08354', '03193', '02866', '02473', '00050', '01775', '03141', '04905', '04629', '08218', '14668', '02508', '02093', '00824', '03910', '04586', '02193', '03446', '05666', '03181', '01865', '02748', '03480', '00488', '05033', '03153', '03486', '04038', '00437', '03830', '03604', '04866', '00997', '04995', '03167', '01178', '12390', '06732', '02823', '01851', '05000', '00741', '02271', '04434', '02001', '05087', '04466', '04921', '00104', '03352', '02119', '04091', '01233', '16717', '03977', '03776', '00229', '01533', '02113', '05362', '03042', '05383', '00759', '03402', '04265', '03885', '03831', '03142', '04655', '03368', '03631', '17636', '01924', '00249', '00204', '02907', '05299', '01866', '03051', '00382', '00899', '03609', '01134', '04707', '03147', '01334', '01809', '02229', '02483', '01475', '00375', '03453', '00596', '04073', '05433', '03785', '16579', '12169', '04419', '02036', '04415', '00142', '01721', '04547', '03585', '02604', '02657', '00512', '00413', '05501', '04635', '03795', '03097', '00233', '09558', '01353', '03096', '03250', '04049', '01375', '01664', '17574', '04097', '03671', '03401', '12088', '04983', '05431', '04433', '02791', '01129', '01601', '00958', '05184', '04813', '01051', '00490', '01412', '04572', '03783', '03640', '04587', '01701', '04831', '06688', '03492', '02741', '15644', '11260', '04375', '05443', '02217', '02012', '01767', '02283', '02765', '02509', '02037', '04616', '00774', '04282', '03633', '02268', '01788', '03557', '05043', '02363', '05158', '00117', '02945', '00630', '02869', '15546', '00714', '04685', '03605', '01062', '03075', '02729', '02603', '00903', '01895', '05096', '03702', '00864', '03775', '00688', '00361', '01194', '03974', '00152', '01499', '02180', '04193', '01553', '01440', '12643', '04053', '04977', '04789', '03891', '04742', '03651', '03013', '01317', '01213', '00966', '03379', '03550', '05164', '02101', '00066', '15769', '02269', '01399', '00491', '03290', '03154', '17549', '02128', '02629', '03006', '01024', '03521', '00588', '04871', '02812', '07020', '00732', '00258', '04165', '02515', '03374', '01908', '00127', '04730', '03757', '12111', '02165', '01712', '00616', '04420', '02205', '03603', '04296', '02524', '00917', '03371', '04599', '02548', '17220', '00320', '04464', '01710', '05450', '03426', '01154', '00100', '01569', '03130', '03980', '05239', '00075', '00528', '04788', '01849', '05171', '04969', '02588', '13830', '02487', '01854', '00615', '04022', '03715', '16883', '04024', '04158', '01524', '04006', '01713', '01756', '02108', '02707', '01762', '10586', '01273', '05484', '10760', '00481', '02134', '03072', '01759', '05251', '03461', '03905', '00548', '04868', '01188', '02432', '00242', '05495', '00593', '04079', '02273', '00751', '01567', '04741', '04412', '02743', '16358', '01626', '03041', '02060', '05117', '01087', '00802', '01797', '02053', '05067', '01085', '02637', '05485', '03029', '03003', '08723', '01207', '03378', '00340', '02086', '04098', '00155', '02970', '02318', '01613', '00795', '04644', '03251', '01111', '04357', '05157', '00531', '05451', '01367', '07283', '04731', '05374', '01002', '05175', '05397', '01170', '02410', '01459', '04839', '03423', '04276', '00426', '04400', '00158', '03330', '00176', '10352', '01350', '01027', '05368', '03713', '05488', '02371', '05153', '01899', '03987', '04249', '04761', '02296', '02147', '01832', '04453', '03157', '04826', '07929', '02563', '00818', '03661', '02146', '02480', '02315', '00081', '01480', '00678', '03548', '02985', '04571', '00792', '01385', '01692', '02711', '04450', '00618', '04563', '04947', '02211', '01745', '03534', '05341', '15787', '12020', '00855', '04833', '05277', '02620', '02675', '14838', '00502', '02727', '04109', '05344', '03532', '04105', '04693', '02375', '04225', '01102', '01147', '01691', '04272', '02507', '04395', '02557', '16185', '03135', '01842', '01732', '03195', '03328', '02179', '01571', '02055', '01114', '01945', '02889', '05116', '00909', '03656', '01816', '02555', '05071', '01112', '04884', '00681', '04851', '03607', '01513', '03248', '01498', '04082', '01594', '02946', '02772', '02202', '04842', '14407', '02405', '00245', '04032', '03624', '03099', '02650', '03986', '04178', '08582', '01830', '00663', '04014', '03232', '03353', '00136', '01229', '13014', '04554', '16186', '00582', '01472', '00589', '14933', '02500', '00555', '00915', '04940', '00805', '01543', '02796', '04529', '02689', '00711', '04875', '03122', '05163', '15626', '00609', '01165', '05337', '00551', '04634', '01082', '04463', '02340', '02263', '01686', '01544', '02838', '04114', '03458', '00786', '03781', '02057', '00034'] 1326개

val ['00130', '00781', '03791', '04665', '01390', '01734', '00611', '00682', '04075', '00706', '01637', '01770', '01566', '01174', '01007', '01142', '01340', '01317', '00377', '01231', '00851', '00776', '00159', '00348', '01444', '00323', '00315', '03852', '01511', '01399', '01460', '00004', '01630', '01203', '00534', '00354', '00740', '04912', '00738', '01757', '01746', '02749', '01126', '01251', '00989', '03338', '01998', '00610', '00842', '00624', '01654', '00122', '00089', '01501', '00594', '01193', '00616', '00933', '00212', '01241', '05521', '01699', '01149', '00772', '00257', '00604', '00109', '01169', '01354', '00058', '00339', '00347', '01742', '00199', '00208', '01222', '01282', '00409', '00921', '00669', '01150', '00564', '01253', '01065', '00844', '01397', '01491', '02403', '01504', '00613', '01387', '01092', '01289', '00281', '00528', '00180', '00452', '00723', '00595', '00845', '01079', '01755', '00956', '01000', '01458', '05647', '00943', '00408', '00882', '01580', '00615', '01003', '01215', '00484', '00836', '05370', '01739', '01525', '01790', '01482', '01756', '01507', '01557', '01629', '01260', '00896', '01636', '00814', '00869', '01106', '00485', '00342', '00632', '01159', '00763', '05179', '00716', '01220', '01780', '00548', '03577', '00860', '01188', '00638', '00794', '00635', '01564', '00579', '01323', '00728', '01706', '01409', '02906', '00238', '03311', '02078', '01782', '00386', '00216', '00026', '05578', '01319', '00721', '00203', '00684', '00619', '00098', '00404', '01796', '01645', '01318', '01377', '00937', '00154', '00664', '00854', '01774', '00439', '01487', '03270', '00050', '01824', '00802', '01010', '04215', '00442', '00748', '00114', '01207', '01267', '00874', '00895', '00591', '00007', '01490', '03566', '00103', '01181', '00059', '01070', '00617', '01452', '00073', '00890', '00131', '00581', '00149', '00904', '01649', '01335', '02778', '00052', '00148', '00231', '01536', '00631', '01084', '01667', '00511', '02823', '01462', '01723', '00466', '00443', '01201', '00426', '03443', '01817', '05087', '01350', '04681', '00425', '00424', '02584', '00311', '00592', '00984', '00473', '00508', '01233', '00515', '01211', '00986', '01533', '01386', '00965', '01467', '01110', '00352', '00897', '02401', '00266', '00211', '00363', '00606', '01371', '00447', '00710', '00336', '00138', '01832', '00758', '00474', '00705', '01168', '00973', '01133', '03668', '00188', '01184', '01078', '00545', '01385', '01575', '00891', '04019', '01679', '00220', '01120', '00196', '01813', '01737', '04083', '00807', '01411', '00586', '00768', '01598', '00095', '04567', '01694', '00019', '01255', '00483', '00091', '00840', '05015', '01453', '00702', '03827', '00206', '01538', '00464', '01147', '00552', '01474', '01732', '00156', '00326', '00499', '01396', '01571', '01454', '00333', '00541', '03638', '00153', '04598', '00909', '00569', '00396', '01664', '00712', '00504', '02726', '01278', '00417', '03401', '01112', '00681', '01577', '00092', '01186', '00047', '01081', '01783', '01515', '01964', '00370', '01281', '00875', '00930', '01728', '00038', '01401', '00461', '01252', '01496', '00680', '00509', '00476', '00958', '01651', '01099', '01741', '00202', '01693', '00309', '01835', '01368', '00271', '00572', '00350', '00514', '00053', '01698', '04164', '01705', '01374', '01771', '00990', '01050', '00268', '00422', '01331', '01572', '00423', '01175', '01659', '00247', '00279', '00821', '03478', '00925', '01302', '00302', '00239', '01160', '00849', '01738', '00692', '00679', '00607', '01940', '01702', '02639', '01365', '00286', '00510', '00797', '03762', '01261', '00106', '00262', '00969', '01383', '01072', '00324', '04374', '00600', '03323', '01369', '01709', '00201', '00088', '00556'] 416개

추가적으로 train 10747은 text가 context 안에 있는 내용과 다릅니다. context = 인수합병(M&A) 관련 법률 자문 text = 인수합병(M"&amp";A) 관련 법률 자문

codertimo commented 3 years ago

@scy6500 안녕하세요! KLUE-MRC 데이터셋에 관심을 갖고 사용해 주셔서 감사합니다. 확인해 본 결과 위에 올려주신 guid 에 해당하는 질문은 모두 "답을 할 수 없는 질문(3번 유형)" 유형으로 확인되었습니다. 따라서 각 guid 에는 answer 가 빈 리스트로 존재하며 "가짜 답변"에 해당되는 "plausible_answers" 가 레이블링되어 있습니다.

학습에 지장을 주는 answer 의 경우에는 answer_start 가 모두 올바른 것을 재 확인하였으나, 위에 올려주신 id에 대한 plausible_answers의 answer_start 와 text normalization에 일부 오류가 있음을 본 이슈에서 확인하였습니다. 이는 업데이트가 되어야할 부분이라고 확인됩니다. 좋은 지적 감사합니다. #14 에서 다루도록 하겠습니다.

다만 plausible_answers 은 실제 정답이 아니기 때문에 일반적인 답변 에측 모델에서는 학습에 이용하지 않습니다. 혹시 가짜 답변인 plausible_answers 를 의도적으로 사용하신 것인지 여쭈어 봅니다. 감사합니다 👍

scy6500 commented 3 years ago

아하 제가 착각을 한 부분이 있었습니다 감사합니다!

14 이슈가 close 되면 #13 이슈도 같이 close 한다고 하셔서 reopen 하겠습니다.