Benchmark dataset filtering rules

bayer-science-for-a-better-life / Img2Mol

Apache License 2.0

108 stars 41 forks source link

Benchmark dataset filtering rules #5

Closed OBrink closed 2 years ago

OBrink commented 2 years ago

Thank you for uploading the exact benchmark datasets that you used to validate Img2Mol! I have a question regarding the size of the different sets. It appears that you have removed some images from the original benchmark sets. For example, the JPO dataset normally consists of 450 images but here it is 365. Could you share the criteria for the removal of images from the sets? Thanks in advance! :)

djork commented 2 years ago

We actually only removed images that either contained Markush structures or rare elements such as Si or Sn.