emorynlp / nlp4j

NLP framework for JVM languages.
http://emorynlp.github.io/nlp4j/
Other
149 stars 33 forks source link

Allow more flexibility in model locations and locations #4

Closed benson-basis closed 8 years ago

benson-basis commented 8 years ago

It would helpful to be able to have models anywhere reachable by Path. That includes, for example, S3 and HDFS. It would be helpful to be able to trade space for time by allow for things that are not necessarily xz-compressed.

jdchoi77 commented 8 years ago

Two things I noticed:

Thanks!

benson-basis commented 8 years ago

OK, I see. I'll fix it up.

On Tue, Jul 26, 2016 at 12:06 PM, Jinho D. Choi notifications@github.com wrote:

Two things I noticed:

  • Several methods such as createXZBufferedInputStream are removed from this change, which is not good since those functions are used everywhere including our internal projects depending on NLP4J APIs. I would suggest to create the other methods without removing them.
  • Taking InputStream as a parameter instead of actual String path had been requested by several users who wanted to load models in different formats. How about overriding these methods so they can take either String path or InputStream?

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/4#issuecomment-235317607, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z2Hup8qql5CyPWETATcPQYC-E6ujks5qZjCAgaJpZM4JUbj1 .

benson-basis commented 8 years ago

I thought about this some more. I'm going to replace this PR with one that makes many fewer changes and preserves compatibility. Making the changes I made and also being compatible is looking really messy to me.

On Tue, Jul 26, 2016 at 12:17 PM, Benson Margulies benson@basistech.com wrote:

OK, I see. I'll fix it up.

On Tue, Jul 26, 2016 at 12:06 PM, Jinho D. Choi notifications@github.com wrote:

Two things I noticed:

  • Several methods such as createXZBufferedInputStream are removed from this change, which is not good since those functions are used everywhere including our internal projects depending on NLP4J APIs. I would suggest to create the other methods without removing them.
  • Taking InputStream as a parameter instead of actual String path had been requested by several users who wanted to load models in different formats. How about overriding these methods so they can take either String path or InputStream?

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/4#issuecomment-235317607, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z2Hup8qql5CyPWETATcPQYC-E6ujks5qZjCAgaJpZM4JUbj1 .

benson-basis commented 8 years ago

@jdchoi77 With the current API, it practical to set up an decoder with no XML file at all? If so, I'm not sure that any of this work is particularly needed. I convinced myself that the XML file was unavoidable, but now I'm not so sure after your remarks. Or, to put it another way, I bet I could come up with a far smaller set of changes to facilitate that approach than the set in the PR.