Cartus / AMR-Parser

Better Transition-Based AMR Parsing with a Refined Search Space (authors' DyNet implementation for the EMNLP18 paper)
https://aclweb.org/anthology/papers/D/D18/D18-1198/
MIT License
10 stars 0 forks source link

align.sh breaks #3

Open ButteredGroove opened 5 years ago

ButteredGroove commented 5 years ago

Hi, and thank you for the update! I've been trying to finish the alignment steps on LDC2017T10, but have run into a bug:

(ve) ~/guo_lu/AMR-Parser-master/data$ ./align.sh
<LOTS OF OUTPUT>
3237
3238
3239
Traceback (most recent call last):
  File "preprocess/merge_file.py", line 80, in <module>
    node_tuple = node_list[index][counter]
IndexError: list index out of range
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 5, in <module>
    reload(sys)
NameError: name 'reload' is not defined
Exception in thread "main" java.io.FileNotFoundException: train.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)
rm: cannot remove 'train.txt': No such file or directory
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 5, in <module>
    reload(sys)
NameError: name 'reload' is not defined
Exception in thread "main" java.io.FileNotFoundException: dev.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)
rm: cannot remove 'dev.txt': No such file or directory
rm: cannot remove 'dev.txt.pb.lemmas': No such file or directory
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 5, in <module>
    reload(sys)
NameError: name 'reload' is not defined
Exception in thread "main" java.io.FileNotFoundException: test.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)
rm: cannot remove 'test.txt': No such file or directory
rm: cannot remove 'test.txt.pb.lemmas': No such file or directory

It appears that the instigating issue is a list index out of range.

In case it helps, my setup diary follows. You'll notice several situations where I had to handle bugs:

Diary: Set up and run Guo and Lu AMR parser

1. Set up base environment:
  * Ubuntu 16.04
  * Python 3.5.2
  * CUDA 8.0
  * Cudnn 6.0

2. Set locale:
export LANG=C.UTF-8
export LANGUAGE=C.UTF-8
export LC_ALL=C.UTF-8

3. Create directory to do work:
mkdir guo_lu
cd guo_lu

4. Create python virtual environment and activate it:
sudo apt-get install python3-venv
python3 -m venv ve
source ve/bin/activate
pip install --upgrade pip
pip install wheel

5. Build and install DyNet 2.0
# Based on manual install instructions from:
# https://dynet.readthedocs.io/en/latest/python.html#manual-installation
sudo apt-get install -y build-essential cmake mercurial git unzip
pip install Cython ordered-set numpy nltk
wget https://github.com/clab/dynet/archive/v2.0.zip
unzip v2.0.zip
cd dynet-2.0
hg clone https://bitbucket.org/eigen/eigen -r b2e267d
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=../eigen -DPYTHON=`which python` -DBACKEND=cuda
make
cd python
python setup.py install

6. Install and set up JAMR
cd ~/guo_lu
sudo apt-get install -y openjdk-8-jre
wget https://github.com/jflanigan/jamr/archive/Semeval-2016.zip
unzip Semeval-2016.zip
cd jamr-Semeval-2016
./setup
. scripts/config.sh
./compile

7. Install AMR Parser
wget -nv https://github.com/Cartus/AMR-Parser/archive/master.zip
unzip master.zip

8. Install MGIZA++
sudo apt-get install libboost-all-dev
cd AMR-Parser-master/data
git clone https://github.com/moses-smt/mgiza.git
cd mgiza/mgizapp
cmake .
make
make install

9. NLTK tagger installation
python
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
quit()

10. Prepare LDC2017T10
# Create links to LDC2017T10 in AMR-Parser-master/data subdirectories
cd ~/guo_lu/AMR-Parser-master/data/amr/data/amrs/split/train/
rm amr-release-1.0-training-bolt.txt
cp -as ~/abstract_meaning_representation_amr_2.0/data/amrs/split/training/* .
cd ../test
rm amr-release-1.0-test-bolt.txt
cp -as ~/abstract_meaning_representation_amr_2.0/data/amrs/split/test/* .
cd ../dev
rm amr-release-1.0-dev-bolt.txt
cp -as ~/abstract_meaning_representation_amr_2.0/data/amrs/split/dev/* .
cd ~/guo_lu/AMR-Parser-master/data

11. Run preprocessing script
./preprocess_17.sh

12. Run JAMR aligner
cd ~/guo_lu/jamr-Semeval-2016
. scripts/config.sh
scripts/ALIGN.sh < ~/guo_lu/AMR-Parser-master/data/amr/tmp_amr/train/amr.txt > ~/guo_lu/AMR-Parser-master/data/jamr_output/train.txt
# BUG !!!!!!!!
>> ### Tokenizing ###
>> panic: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64 at /gpfs-volume/guo_lu/jamr-Semeval-2016/tools/cdec/corpus/support/quote-norm.pl line 149, <STDIN> line 1.
>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
>>        at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:48)
>>        at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:43)
>>        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>        at edu.cmu.lti.nlp.amr.CorpusTool$.main(CorpusTool.scala:43)
>>        at edu.cmu.lti.nlp.amr.CorpusTool.main(CorpusTool.scala)

# Fix this bug in JAMR as per https://github.com/jflanigan/jamr/issues/17
sed -i_BAK '149 s/^/#/' tools/cdec/corpus/support/quote-norm.pl

# Now run aligner:
scripts/ALIGN.sh < ~/guo_lu/AMR-Parser-master/data/amr/tmp_amr/train/amr.txt > ~/guo_lu/AMR-Parser-master/data/jamr_output/train.txt
scripts/ALIGN.sh < ~/guo_lu/AMR-Parser-master/data/amr/tmp_amr/test/amr.txt > ~/guo_lu/AMR-Parser-master/data/jamr_output/test.txt
scripts/ALIGN.sh < ~/guo_lu/AMR-Parser-master/data/amr/tmp_amr/dev/amr.txt > ~/guo_lu/AMR-Parser-master/data/jamr_output/dev.txt

13. Run Hybrid Aligner
cd ~/guo_lu/AMR-Parser-master/data
./align.sh
# BUG !!!!!!!!
>> <snip>
>> cat: write error: Broken pipe
>>  File "./scripts//stem-4-letters.py", line 10
>>    print ' '.join(w if w.startswith(':') or (w.startswith('++') and w.endswith('++')) else w[:3] for w in line.strip().split())

# stem-4-letters.py uses the python2 print.  I'm not sure how many other
# scripts use the old python2 print statement.  It appears that the
# AMR parser code assumes that python3 = Python 3.5.2 and python = Python 2.?
# This also means that the AMR parser has an undocumented requirement of
# Python 2.
# I decided to replicate the assumption in my environment by linking
# python to python2:
cd ~/guo_lu/ve/bin
rm pip python
ln -s /usr/bin/python2 python

# Try again:
cd -
./align.sh
# BUG !!!!!!!!
./scripts//run_aligner.sh: 7: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//plain2snt: not found
./scripts//run_aligner.sh: 8: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//mkcls: not found
./scripts//run_aligner.sh: 9: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//mkcls: not found
./scripts//run_aligner.sh: 10: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//snt2cooc: not found
./scripts//run_aligner.sh: 14: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//plain2snt: not found
./scripts//run_aligner.sh: 15: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//mkcls: not found
./scripts//run_aligner.sh: 16: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//mkcls: not found
./scripts//run_aligner.sh: 17: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//snt2cooc: not found

# Looks like a bad path.
# Fixed it by editing last two lines of:
# ~/guo_lu/AMR-Parser-master/data/align/addresses.keep
MGIZA_SCRIPT=~/guo_lu/AMR-Parser-master/mgiza/mgizapp/scripts
MGIZA_BIN=~/guo_lu/AMR-Parser-master/mgiza/mgizapp/bin

# Try #3:
./align.sh
# BUG !!!!!!!!
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 1, in <module>
    import nltk
ImportError: No module named nltk
Exception in thread "main" java.io.FileNotFoundException: train.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)

# align.sh calls align2conll.py using python, not python3.  The README
# specifically states to install nltk under Python3.  Maybe the python=python2
# assumption isn't a good one?  The print statements of align2conll.py are
# Python 3 style.  So we have a case where python should be python3.
#
# I decided to edit align.sh to maintain the python = Python 2,
# python3 = Python3 paradigm.:
Line 32: python3 preprocess/align2conll.py hybrid_pr.txt train.txt
Line 40:     python3 preprocess/align2conll.py ${JAMR_DIR}/${SPLIT}.txt ${SPLIT}.txt

# Try #4:
./align.sh
# BUG !!!!!!!!
<snip>
3238
3239
Traceback (most recent call last):
  File "preprocess/merge_file.py", line 80, in <module>
    node_tuple = node_list[index][counter]
IndexError: list index out of range
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 5, in <module>
    reload(sys)
NameError: name 'reload' is not defined
Exception in thread "main" java.io.FileNotFoundException: train.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)
rm: cannot remove 'train.txt': No such file or directory
<snip>
Cartus commented 5 years ago

I am trying to solve this issue, but it will take some time. I will update you once I fix the bugs.