emorynlp / nlp4j

NLP framework for JVM languages.
http://emorynlp.github.io/nlp4j/
Other
149 stars 33 forks source link

Deprecate disunified repos #1

Open benson-basis opened 8 years ago

benson-basis commented 8 years ago

@jdchoi77 Here is the unified repo. I will proceed from here in PR's for other work. The readme might want some more work to explain the structure, let me know if you want me to do it.

jdchoi77 commented 8 years ago

Thanks for the work; I'm trying to think if it is worth at all to have these projects separated at this point. They share the same package naming so it's fairly easy to merge all of them into a super project. What do you think?

benson-basis commented 8 years ago

Just to be sure we're on the same page: you're suggesting taking an additional step of combining core, common, morphology, tokenization, and 'nlp4j' into a single Maven project: one src/main, etc?

If it were up to me, I wouldn't go that far. I'd reduce it to two modules: one with all the code as viewed as an API, and another with the command line interfaces and wrappers. It's mostly a question of taste to separate commands from API; nothing horrible happens if you lump them together. This much split means that we can use less complex Maven tricks to have the right stuff in the classpath of the commands that we don't want to force into the classpath of applications.

However, didn't you tell me before that you originally split them up into multiple jars because someone complained? Is that complaint still a concern?

On Tue, Jul 26, 2016 at 11:54 AM, Jinho D. Choi notifications@github.com wrote:

Thanks for the work; I'm trying to think if it is worth at all to have these projects separated at this point. They share the same package naming so it's fairly easy to merge all of them into a super project. What do you think?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235313720, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z8Z1HLx2zn1W7wdzJEsPlEWvV6aGks5qZi27gaJpZM4JSL7S .

jdchoi77 commented 8 years ago

Yes, that was the concern, but now we have modulated the code much better, I think it should be ok (NLP4J, previously called ClearNLP, even before called ClearParser, has evolved a lot since 2011 :) I'd take the suggestion of separating CLI to the pure API. If you'd like to do this merge, I can spend some time later today to review the code and finalize the unification. Thanks!

benson-basis commented 8 years ago

I'll put the stream business aside for now and do the unification today.

On Tue, Jul 26, 2016 at 12:16 PM, Jinho D. Choi notifications@github.com wrote:

Yes, that was the concern, but now we have modulated the code much better, I think it should be ok (NLP4J, previously called ClearNLP, even before called ClearParser, has evolved a lot since 2011 :) I'd take the suggestion of separating CLI to the pure API. If you'd like to do this merge, I can spend some time later today to review the code and finalize the unification. Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235320806, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z9OQOH6g2TyMVXCtrXLKwReaSYPFks5qZjLigaJpZM4JSL7S .

benson-basis commented 8 years ago

I've put up a PR for this. I think I've accounted for everything. It's a bit disorienting.

benson-basis commented 8 years ago

I hope you won't be too annoyed that I added some very small and local changes that rescue the useful part of the big file/stream PR. It's completely compatible. If you don't like it, let me know, and I can move those commits to another PR. I was sitting on the wrong branch, and I need to go do an errand with my wife, so I decided to just push this up to you and see if it was OK.

jdchoi77 commented 8 years ago

I just merged to PR. I'm going to take a look and see if there are changes needed now. Will get back to you soon.

jdchoi77 commented 8 years ago

I updated the pom.xml of submodules, removing the versions for the submodules. If I remember right, this would adapt the parent's version, but please let me know if this should be back. I also made a small change due to the upgrade of fastutils. Thanks.

benson-basis commented 8 years ago

I'll have a look.

On Wed, Jul 27, 2016 at 10:21 AM, Jinho D. Choi notifications@github.com wrote:

I updated the pom.xml of submodules, removing the versions for the submodules. If I remember right, this would adapt the parent's version, but please let me know if this should be back. I also made a small change due to the upgrade of fastutils. Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235600043, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z23v68SxTOnBU0qYKPoDOcjAgoSlks5qZ2lVgaJpZM4JSL7S .

jdchoi77 commented 8 years ago

I just moved the tokenization package to edu/emory/mathcs/nlp/component/tokenizer/ . I'll be testing each module this afternoon.

benson-basis commented 8 years ago

Your POM change was correct. The only version needed in a submodule is the one in the parent, I missed cleaning that up.

On Wed, Jul 27, 2016 at 10:28 AM, Benson Margulies benson@basistech.com wrote:

I'll have a look.

On Wed, Jul 27, 2016 at 10:21 AM, Jinho D. Choi notifications@github.com wrote:

I updated the pom.xml of submodules, removing the versions for the submodules. If I remember right, this would adapt the parent's version, but please let me know if this should be back. I also made a small change due to the upgrade of fastutils. Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235600043, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z23v68SxTOnBU0qYKPoDOcjAgoSlks5qZ2lVgaJpZM4JSL7S .

benson-basis commented 8 years ago

There will be a small celebration up here once we have a release from this version.

On Wed, Jul 27, 2016 at 10:28 AM, Jinho D. Choi notifications@github.com wrote:

I just moved the tokenization package to edu/emory/mathcs/nlp/component/tokenizer/ . I'll be testing each module this afternoon.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235602558, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9zzAbVL2YitLyTXTCD9JQajviVC5iks5qZ2scgaJpZM4JSL7S .

jdchoi77 commented 8 years ago

Haha yes, also, how about calling the "commands" project as "cmd" or "cli"? I should check what people usually call this kind of project.

benson-basis commented 8 years ago

I don't have a gut sense of the relative popularity of these options. I do think that 'cli' is more descriptive, so if you're inclined to make a change, I'd vote for 'cli' over 'cmd'.

On Wed, Jul 27, 2016 at 10:59 AM, Jinho D. Choi notifications@github.com wrote:

Haha yes, also, how about calling the "commands" project as "cmd" or "cli"? I should check what people usually call this kind of project.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235612935, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z-k2e05VFiDvMgRriYGt0qwQ5DyZks5qZ3JpgaJpZM4JSL7S .

jdchoi77 commented 8 years ago

I like 'cli' as well, let me see if i can make this change.

benson-basis commented 8 years ago

Another thing that we could do here at some point is to add the maven-assembly-plugin to make a binary tarball with a bin and a lib and such, so that someone can just download that and start running commands.

On Wed, Jul 27, 2016 at 11:06 AM, Benson Margulies benson@basistech.com wrote:

I don't have a gut sense of the relative popularity of these options. I do think that 'cli' is more descriptive, so if you're inclined to make a change, I'd vote for 'cli' over 'cmd'.

On Wed, Jul 27, 2016 at 10:59 AM, Jinho D. Choi notifications@github.com wrote:

Haha yes, also, how about calling the "commands" project as "cmd" or "cli"? I should check what people usually call this kind of project.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j-unified/issues/1#issuecomment-235612935, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9z-k2e05VFiDvMgRriYGt0qwQ5DyZks5qZ3JpgaJpZM4JSL7S .

jdchoi77 commented 8 years ago

That's a great idea; i've been doing that manually but maven plugin should help.