ITI / searcch

SEARCCH Hub Frontend
https://searcch.cyberexperimentation.org/
BSD 3-Clause "New" or "Revised" License
3 stars 6 forks source link

Importer creates duplicate keywords for DOIs #96

Closed lauratinnel closed 3 years ago

lauratinnel commented 3 years ago

Describe the bug When importing from a DOI, the importer creates/suggests duplicate keywords with different cases (e.g., "Test" and "test").

To Reproduce Steps to reproduce the behavior:

  1. From the SEARCCH importer, import "https://dl.acm.org/doi/pdf/10.1145/3474718.3474726"
  2. Click on Edit
  3. Look at the keywords extracted

Expected behavior Importer should filter keywords using a case insensitive filter.

Screenshots

Screen Shot 2021-10-13 at 7 21 42 AM

Desktop (please complete the following information):

Smartphone (please complete the following information): n/a

Additional context Add any other context about the problem here.

lauratinnel commented 3 years ago

Actually, it's creating duplicates even with the same case.

carboxylman commented 3 years ago

Yes, believe it or not, that was intentional, but it is far too overwhelming to remove all the duplicates. We are already storing the exact kw/rank info as metadata, so we don't need to promote duplicates. Fixed in https://gitlab.flux.utah.edu/searcch/importer/-/commit/72d442b643a1c6cb7dd449fdb5a92f69aec0247a ; no duplicates are allowed; first case wins.