Closed rosner closed 9 years ago
Does "mvn package" work for you? Does it succeed in building the final jar file?
If you want to use/try/evaluate the system, what's stopping you? These file/directory structuring things would be nice to have, but are they actually stopping you from getting work done?
We don't know much about the right way to structure java projects, so help will be appreciated.
Why are you providing the jargs jar? Did you change something in it so you cannot use the standard version that is accessible through maven?
That was before my time, I don't know
The same goes for the gnu trove jar that you provide. Any changes made to the library?
I doubt it
Why are you separating the actual src files into the separate src folder in the root of the project while maintaining the resources in the ark-tweet-nlp folder?
It seemed easier. I hate the way maven nests the src folder really deep by default, but figured that for resources we might as well use maven's default.
Are metaphone-map2.txt and ptb_ordered_metaphone.txt that are contained in the lib directory external resources or are they created by you? If so, why are they in the lib directory?
Created by us. They should be in resources/, that would be better.
Where is the posBerkeley.jar from? Is it available to the public (e.g. from here)?
It was sent to us via email, i believe, but that was before my time. (See the licensing file.) I don't like using it for this reason, because it's not directly available online to the public anywhere I know of -- though many parts of it are included in various Berkeley NLP software on that page.
You're right: the jar is building with mvn packaging
. I can use it now since I trained the model. So everything is fine. The reason I started digging around in the project itself was that the help of the tagger says that it uses an internal model. As I read it, it could be either a file or a resource that comes within the jar. But the default model that is hardcoded in the RunTagger
class is not included in the jar.
Thanks!
yeah, the model can be downloaded from the website. it's the only resource that's not checked-in.
On Mon, Oct 22, 2012 at 10:44 AM, Norman Rosner notifications@github.comwrote:
You're right: the jar is building with mvn packaging. I can use it now since I trained the model. So everything is fine. The reason I started digging around in the project itself was that the help of the tagger says that it uses an internal model. As I read it, it could be either a file or a resource that comes within the jar. But the default model that is hardcoded in the RunTagger class is not included in the jar.
Thanks!
— Reply to this email directly or view it on GitHubhttps://github.com/brendano/ark-tweet-nlp/issues/15#issuecomment-9666273.
Do you have any suggestions how to make the process less painful? I added a note about the model in particular to docs/hacking.txt.
First I recommend using the standard maven project structure although you don't like the deep nesting. Every one who uses maven is used to the specific structure. It should als simplify the pom.xml
.
Second, I believe that the jargs dependency is not used at all in the project so it could be removed. The RunTagger
and the Train
class parse the args manually and thus this dependency is not needed. Also gnu trove dependency could be fetched from a repository. As it turns out there's only one class (OWLQN
) that is using gnu trove's THashSet
.
Third I would ignore the build artifacts in the repo itself. Instead I would use maven to upload the arktweetnlp artifact to the repos Download section. Thus it stays out of the repo but is still accessible if people have problems building it.
Fourth the shell scripts to run the tagger or the tokenizer could be removed or edited so they don't confuse if they can't be run successfully. Also I don't understand the java.sh
in the scripts directory. I believe that you guys use it for setting up your dev environment like IDE and stuff?
What do you think? I could work on a PR if you need help. Let me know.
Thanks for looking into this.
FYI, java.sh is as I described in hacking.txt -- it just makes it easy to run the tagger on the commandline when developing in an IDE, by using the version of the .class files that (e.g.) Eclipse is auto-compiling. This is very helpful for quick development -- this is how we can do things like fix #14 so fast :)
on trove and owlqn -- so it's only a training-time dependency.
I don't understand the proposal to edit the runTagger and twokenize scripts -- are we talking about comments in them, or something?
Hey folks,
I just wanted to try out your tagger, but I can't get it to run. First of I tried following your hacking.txt but no success.
Also the project structure is weird for a java project. So I have some questions about this project:
src
folder in the root of the project while maintaining the resources in theark-tweet-nlp
folder?metaphone-map2.txt
andptb_ordered_metaphone.txt
that are contained in thelib
directory external resources or are they created by you? If so, why are they in thelib
directory?posBerkeley.jar
from? Is it available to the public (e.g. from here)?Since I want to use/try/evaluate it, I'm very interested in your project. I'm also experienced with maven, java, eclipse so I could help you with restructuring this stuff.