BitFunnel / Workbench

Java and Lucene based tools for BitFunnel corpus preparation
http://bitfunnel.org
MIT License
19 stars 4 forks source link

i, ii, iii, etc. #14

Closed danluu closed 7 years ago

danluu commented 7 years ago

It looks like we're indexing the index number in lists or something like that?

a8120ede65e124a2,1,1,0.068493,i
4148ef8fddabef20,1,1,0.0374597,ii
971fe168e0442370,1,1,0.0113684,iii
99c51617da0db9ac,1,1,0.00588676,iv
MikeHopcroft commented 7 years ago

These are often parts of proper names like Jefferson B. Sessions III in https://en.wikipedia.org/?curid=303.

IV is part of a reference: IV.27.1 in https://en.wikipedia.org/?curid=305.