OpenSextant / SolrTextTagger

A text tagger based on Lucene / Solr, using FST technology
Apache License 2.0
173 stars 37 forks source link

just starting out #27

Open ghost opened 10 years ago

ghost commented 10 years ago

Hello,

I have downloaded the SolrTextTagger, and built my jar. I also have a current solr instance with the settings you have suggested.

I wanted to try out a sample dictionary (gazetteer), but don't see one. Is the format "foo","bar"

This is amazingly cool code, I hope to get something running soon.

Thanks, Evan

ghost commented 10 years ago

Hello,

Reading the DevNotes.txt Looking to run the ./updateSolr.sh

Where is that script?

Evan

ghost commented 10 years ago

Also what is the "Merged.txt" .. build=true' < /Volumes/Speedy/Merged.txt

dsmiley commented 10 years ago

So sorry I didn't respond sooner. My GitHub notification settings needed to be updated.

DevNotes.txt is perhaps something I shouldn't have committed; ignore it. Though it does have a curl entry for doing the tagger request, and that's useful.

I've forgotten if OpenSextant has a sample gazetteer, but it's not in this sub-project. Consider downloading a geonames CSV file and then uploading/indexing it to Solr using Solr's CSV method. The DevNotes.txt is sort of doing that but with a CSV from anther source (not geonames). You'll need to develop a schema for it.

Clearly there needs to be better "on-boarding" documentation here. What has hampered things is that the master branch is on v1.x and v2.x (MemPF branch) is farther along but is held up until a key feature from 1.x gets ported. And v1.x and 2.x are configured differently.

dsmiley commented 10 years ago

BTW I've noticed your comment http://sujitpal.blogspot.com/2014/02/fuzzy-string-matching-with.html -- I thought your name was familiar.

mubaldino commented 10 years ago

FYI -- Evan, David,

greetings. http://opensextant.org/downloads.html -- The "Merged.txt" file likely has a new name, but it is there under "OpenSextant Gazetteer data", the latestGazetteer.zip file.

jprochaz commented 10 years ago

Hi Dave, I saw this thread and thought I would jump in with my related question. Any plans on moving the MemPF branch to master and moving the v1.x master as a branch. I recently upgraded my version of OpenSextant to use Solr 4.7.2 and it took me a while to figure out to use the MemPF branch to build the Solr 4.7 compliant tagger. Pushing out the latest MemPF snapshot to the maven repo would have worked as well. Thanks!

dsmiley commented 10 years ago

@jprochaz See #23 (no change) I forgot wether or not 1.x branch supports Solr 4.7 or not but your comment says it doesn't. It's probably something really minor like default enablePositionInc or something like that.

ghost commented 10 years ago

Hello,

Thank you for following up on all of this.

Would you be interested in coming by our offices for a talk on your work? I think folks would be interested in hearing more about OpenSextant and in particular the SolrTextTagger.

I am Chief Architect at Decision Resources Group in Burlington MA

Evan

David Smiley wrote:

@jprochaz https://github.com/jprochaz See #23 https://github.com/OpenSextant/SolrTextTagger/issues/23 (no change) I forgot wether or not 1.x branch supports Solr 4.7 or not but your comment says it doesn't. It's probably something really minor like default enablePositionInc or something like that.

— Reply to this email directly or view it on GitHub https://github.com/OpenSextant/SolrTextTagger/issues/27#issuecomment-43211833.

Evan C. Smith, MS, MD Evan.Smith.MS.MD@gmail.com Cell: 781-879-8736

jprochaz commented 10 years ago

@dsmiley Thanks. I believe it was docSet.getBits()...

mubaldino commented 10 years ago

Hi Evan Smith,

we can talk at my other email address ubaldino@mitre.org; Please include David, I don't have his contact info.

Hm.... We're in Bedford, next door. Would like to meet;

cheers, Marc