collectiveaccess / providence

Cataloguing and data/media management application
GNU General Public License v3.0
293 stars 166 forks source link

Leading zeros in object identifiers #1314

Open Kaiami opened 2 years ago

Kaiami commented 2 years ago

We'd like CollectiveAccess's search to play nicely with object id numbers where folks have been inconsistent with preceeding zeros. I'd like to be able to search 2005.1.1, and have it find 2005.00001.0001 and 2005.01.00001. In a quest to make this happen, we checked and INDEX_AS_IDNO is (and was) on. The bit about tokenizing and whether punctuation (i.e. periods dividing numbers up in standard tripartite number systems) is a bit confusing to me. Before tweaking, it was idno = { STORE, DONT_TOKENIZE, INDEX_AS_IDNO, BOOST = 100 } -- and did not find numbers unless we put in the exact correct quantity of preceeding zeros. Now we've tweaked it to idno = { STORE, TOKENIZE, INDEX_AS_IDNO, BOOST = 100 }, and in some cases it had the desired effect. I can search 1989.124.1153 and find 1989.0124.1153. But searching searching D2005.1.1 does not get me D2005.0001.0001. But searching D2005.1 does. We need additional flexibility around preceeding zeros in object identifiers -- specifically, we need numbers with them to be treated the same as numbers without them for search, and also for sort. (In my ideal world, for sort purposes, 2002.0001 would appear above 2002.01, and that above 2002.1.

collectiveaccess commented 2 years ago

We will add an option to deal with this. Stay tuned.