Hello, I notice that there are some duplicated entries (precisely 86) in the rareact.csv file. I was wondering should not we discard them? If yes, that should change some of the dataset statistics and results as well. I would appreciate your thoughts on this.
Hello, I notice that there are some duplicated entries (precisely 86) in the
rareact.csv
file. I was wondering should not we discard them? If yes, that should change some of the dataset statistics and results as well. I would appreciate your thoughts on this.here are the duplicated rows:
[255, 257, 258, 445, 451, 454, 465, 481, 483, 485, 495, 509, 510, 511, 539, 545, 569, 574, 610, 619, 621, 625, 636, 679, 716, 733, 741, 806, 843, 878, 879, 884, 891, 904, 910, 936, 939, 950, 970, 997, 1004, 1010, 1015, 1016, 1028, 1039, 1049, 1075, 1083, 1086, 1094, 1095, 1104, 1137, 1164, 1167, 1182, 1210, 1212, 1228, 1291, 1342, 1346, 1382, 1398, 1399, 1400, 1404, 1430, 1434, 1436, 1451, 1457, 1459, 1460, 1495, 1496, 1498, 1503, 1504, 1510, 1564, 5001, 5066, 5892, 5893]