BLLIP / bllip-parser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.
http://bllip.cs.brown.edu/
227 stars 53 forks source link

Add and package higher-level Java interface #46

Open dmcc opened 8 years ago

dmcc commented 8 years ago

Python has a higher-level interface to SWIG, but the current repository doesn't include one for Java. I've written one for Java. However, it is not well packaged and thus hadn't been checked in previously.

From what I can, the best way to distribute Java native code is as a .nar file (http://maven-nar.github.io/), so I hope to move towards that type of system. If you're a Maven/Java packaging expert, please feel free to get involved!

danyaljj commented 6 years ago

Hey @dmcc I have extensive maven/Java background and interested in helping this. I looked into the java sample codes you have written for the first stage and the second stage. If I manage to compile/run the code I will happily do the java/maven packaging.

So I cloned the repo and compiled it (make with some hoops and loops; had to update to gcc-4.9). Something that I am not sure is, how to generate the SWIG binary files. Specifically in the java sample files, you have: https://github.com/BLLIP/bllip-parser/blob/ca98ab0b513b1d9f5330d0051e9083a600676ae0/second-stage/programs/features/swig/java/test/test.java#L20 and https://github.com/BLLIP/bllip-parser/blob/ca98ab0b513b1d9f5330d0051e9083a600676ae0/first-stage/PARSE/swig/java/test/test.java#L20

where you load binary files generated by SWIG; right? How do I get those?

danyaljj commented 6 years ago

I looked into the Makefiles and I see that there are arguments for building with SWIG. Will have to install SWIG to generate the binary files. As a side note, I wonder if the binary file would generalize across operating systems, or I'll have to create different binary files, for each OS.

Update: more hoops to jump over. Downloaded SWIG. I had to install this extra library regex via homebrew, because mac is missing it.

Went to the first step make; it didn't find find jni.h. Had to add include it in CFLAG. Then it complained about this.

dmcc commented 6 years ago

Hey @danyaljj, thanks for looking into this! It's been a while since this was fresh in my mind, but please keep the thread posted.

load binary files generated by SWIG; right? How do I get those?

Right, these are generated by SWIG (or at least their sources are).

As a side note, I wonder if the binary file would generalize across operating systems, or I'll have to create different binary files, for each OS.

I don't think they will necessarily work across OSes, unforutnately. I've seen cases where Java native extensions include some architecture information in their filenames to keep the binaries straight.

Update: more hoops to jump over. Downloaded SWIG. I had to install this extra library regex via homebrew, because mac is missing it.

If possible, please keep track of these steps since we'll probably want to include them in java/README.rst.

Regarding SWIG_LINKER_FLAGS in the Makefile: Yeah, that's the type of setting that I would hope we could get maven to set for us. The value for that is hopelessly platform dependent. Were you able to find a value that works?

Let me know if you have other questions.

On Thu, Sep 21, 2017 at 7:25 AM, Daniel Khashabi notifications@github.com wrote:

I looked into the Makefiles and I see that there are arguments for building with SWIG. Will have to install SWIG to generate the binary files.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BLLIP/bllip-parser/issues/46#issuecomment-331172827, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm5ZdIMboYqnP5g411N7QerDP7ZEKg4ks5sknHOgaJpZM4HKvtu .

danyaljj commented 6 years ago

If possible, please keep track of these steps since we'll probably want to include them in java/README.rst.

Yup. Noting every time something unexpected happen.

Some missing libraries:

I'm getting this when I run make -C first-stage/PARSE swig-java in first-stage/PARSE, here is what I'm getting:

art.o swig/build/java_wrapper.o
Undefined symbols for architecture x86_64:
  "SimpleChart::addChildTrees(Node&, std::list<InputTree*, std::allocator<InputTree*> >*, InputTree*)", referenced from:
      _Java_SWIGParserJNI_SimpleChart_1addChildTrees in java_wrapper.o
  "SimpleChart::prunePreterms(int, int)", referenced from:
      _Java_SWIGParserJNI_SimpleChart_1prunePreterms in java_wrapper.o
  "SimpleChart::pruneConstituents(int, int, float)", referenced from:
      _Java_SWIGParserJNI_SimpleChart_1pruneConstituents in java_wrapper.o
  "SimpleChart::parse()", referenced from:
      _Java_SWIGParserJNI_SimpleChart_1parse in java_wrapper.o
  "SimpleChart::prune(float)", referenced from:
      _Java_SWIGParserJNI_SimpleChart_1prune in java_wrapper.o
  "SimpleChart::populate(InputTree*, float)", referenced from:
      _Java_SWIGParserJNI_SimpleChart_1populate in java_wrapper.o
  "SimpleChart::fillChart()", referenced from:
      _Java_SWIGParserJNI_SimpleChart_1fillChart in java_wrapper.o
  "SimpleChart::initChart()", referenced from:
      _Java_SWIGParserJNI_SimpleChart_1initChart in java_wrapper.o
  "SimpleChart::makeTrees(Node&, InputTree*)", referenced from:
      _Java_SWIGParserJNI_SimpleChart_1makeTrees in java_wrapper.o
  "SimpleChart::SimpleChart(int)", referenced from:
      _Java_SWIGParserJNI_new_1SimpleChart in java_wrapper.o
  "SimpleChart::~SimpleChart()", referenced from:
      _Java_SWIGParserJNI_delete_1SimpleChart in java_wrapper.o
  "Node::Node(int, int, std::list<int, std::allocator<int> >, float, Node*, Node*)", referenced from:
      _Java_SWIGParserJNI_new_1Node_1_1SWIG_11 in java_wrapper.o
  "Node::Node(int, int, int, float)", referenced from:
      _Java_SWIGParserJNI_new_1Node_1_1SWIG_10 in java_wrapper.o
  "Node::termNames() const", referenced from:
      _Java_SWIGParserJNI_Node_1termNames in java_wrapper.o
  "operator<<(std::basic_ostream<char, std::char_traits<char> >&, ScoredSpan const&)", referenced from:
      _Java_SWIGParserJNI_stream_1extraction_1_1SWIG_11 in java_wrapper.o
  "operator<<(std::basic_ostream<char, std::char_traits<char> >&, SimpleChart const&)", referenced from:
      _Java_SWIGParserJNI_stream_1extraction_1_1SWIG_13 in java_wrapper.o
  "operator<<(std::basic_ostream<char, std::char_traits<char> >&, Node const&)", referenced from:
      _Java_SWIGParserJNI_stream_1extraction_1_1SWIG_12 in java_wrapper.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
make: *** [swig/java/lib/libSWIGParser.so] Error 1

I think the linker is complaining about not finding these function definitions, while I can see that they exist right there in PARSE (Fusion.o):

Daniels-MacBook-Pro-2:bllip-parser daniel$ ls first-stage/PARSE/
AnsHeap.C   ChartBase.C ECArgs.h    FBinaryArray.C  Field.C     InputTree.C MeChart.h   SentRep.C   UnitRules.C evalTree.C  headFinder.C    parseIt.o
AnsHeap.h   ChartBase.d ECArgs.o    FBinaryArray.d  Field.d     InputTree.d MeChart.o   SentRep.d   UnitRules.d evalTree.d  headFinder.d    swig
AnsStrCounts.C  ChartBase.h ECString.h  FBinaryArray.h  Field.h     InputTree.h Params.C    SentRep.h   UnitRules.h evalTree.o  headFinder.h    utils.C
AnswerTree.C    ChartBase.o Edge.C      FBinaryArray.o  Field.o     InputTree.o Params.d    SentRep.o   UnitRules.o ewDciTokStrm.C  headFinder.o    utils.d
AnswerTree.h    ClassRule.C Edge.d      Feat.C      FullHist.C  Item.C      Params.h    SimpleAPI.C ValHeap.C   ewDciTokStrm.d  headFinderCh.C  utils.h
Bchart.C    ClassRule.d Edge.h      Feat.d      FullHist.d  Item.d      Params.o    SimpleAPI.d ValHeap.d   ewDciTokStrm.h  headFinderCh.d  utils.o
Bchart.d    ClassRule.h Edge.o      Feat.h      FullHist.h  Item.h      ParseStats.C    SimpleAPI.h ValHeap.h   ewDciTokStrm.o  headFinderCh.h  weakdecls.h
Bchart.h    ClassRule.o EdgeHeap.C  Feat.o      FullHist.o  Item.o      ParseStats.d    SimpleAPI.o ValHeap.o   extraMain.C headFinderCh.o
Bchart.o    CntxArray.C EdgeHeap.d  Feature.C   Fusion.C    Link.C      ParseStats.h    Term.C      Wrd.h       extraMain.d oparseIt.C
BchartSm.C  CntxArray.d EdgeHeap.h  Feature.d   Fusion.d    Link.d      ParseStats.o    Term.d      auxify.C    extraMain.h parseAndEval
BchartSm.d  CntxArray.h EdgeHeap.o  Feature.h   Fusion.h    Link.h      ReadTree.C  Term.h      auxify.h    extraMain.o parseAndEval.C
BchartSm.o  CntxArray.o EgsFromTree.C   Feature.o   Fusion.o    Link.o      ReadTree.h  Term.o      edgeSubFns.C    fhSubFns.C  parseAndEval.d
Bst.C       CombineBests.C  ExtPos.C    FeatureTree.C   GotIter.C   Makefile    ScoreTree.C TimeIt.C    edgeSubFns.d    fhSubFns.d  parseAndEval.o
Bst.d       CombineBests.h  ExtPos.d    FeatureTree.d   GotIter.d   Makefile.dep    ScoreTree.d TimeIt.d    edgeSubFns.h    fhSubFns.h  parseIt
Bst.h       ECArgs.C    ExtPos.h    FeatureTree.h   GotIter.h   MeChart.C   ScoreTree.h TimeIt.h    edgeSubFns.o    fhSubFns.o  parseIt.C
Bst.o       ECArgs.d    ExtPos.o    FeatureTree.o   GotIter.o   MeChart.d   ScoreTree.o TimeIt.o    evalTree    fusion      parseIt.d

Not sure what to do with this. Ideas?

danyaljj commented 6 years ago

Another try on a fresh ubuntu machine (everything so far was Mac).

Here are the things that came up:

Here is how I had to change it, but it might be different from one distribution to another:

  # this should be the path to jni.h
  SWIG_JAVA_GCCFLAGS ?=  -I/usr/lib/jvm/java-8-openjdk-amd64/include/  -I/usr/lib/jvm/java-8-openjdk-amd64/include/linux/
  #-I/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/include/  -I/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/include/linux/
make swig-java 
khashab2@jackie:~/ideaProjects/bllip-parser/second-stage/programs/features$ tree swig/java/
swig/java/
├── lib
│   ├── libSWIGReranker.so
│   ├── NBestList.java
│   ├── RerankerError.java
│   ├── RerankerModel.java
│   ├── SWIGReranker.java
│   ├── SWIGRerankerJNI.java
│   └── Weights.java
└── test
    └── test.java
dmcc commented 6 years ago

Thanks for the updates, sounds like progress!

For the OS X install, I'm afraid I'm not sure why it's not finding the symbols. Maybe it's missing Fusion.o from the list of COMMON_OBJS? On the other hand, I'm not sure why it would need those symbols since I don't remember fusion being wrapped.

I just tried a fresh install on Ubuntu (Zesty) -- needed to install openjdk-8-jdk and maven (which, er, might tell you how often I'm developing in Java these days...) but after that, was able to do make swig-java from the root.

It's possible that swig/java/test/test.java might only work with the non-Maven install (it's an older version of a test file). Have you tried running BllipParserTest against the Maven-built jar?