delph-in / pydelphin

Python libraries for DELPH-IN
https://pydelphin.readthedocs.io/
MIT License
79 stars 27 forks source link

parsing ACE output #197

Closed arademaker closed 5 years ago

arademaker commented 5 years ago

Trying to parse a long sentence:

It is concluded that locally available pumice material could be used to replace sand in engineering bentonite seals, despite some differences in their geotechnical properties.

with ACE in the command line, I was able to produce an output file. How to parse it in pydelphi to further convert to other formats? I tried to call ACE from the python console but I got timeout

>>> response = ace.parse('erg.dat', 'It is concluded that locally available pumice material could be used to replace sand in engineering bentonite seals, despite some differences in their geotechnical properties.')
NOTE: hit RAM limit while unpacking
NOTE: parsed 1 / 1 sentences, avg 1536041k, time 59.70439s
arademaker commented 5 years ago

Sorry, in this particular case it actually parsed and produced the output in the response variable. But anyway, the question remains. How to parse a file produced by a batch processing of sentences with ACE?

Moreover, I didn't find how to pass additional parameters to ACE in the ace.parse function/method. Did I miss something in the documentation?

goodmami commented 5 years ago

Regarding the first problem, I think you'll get the results that ACE was able to work through before hitting a timeout or RAM limit, but note that these won't necessarily be the best results (it could be that the highest ranked result would have been added to the chart later). Using PyDelphin instead of ACE at the command line is not likely to incur too much RAM overhead, so I doubt you'll see a noticeable difference in the number of timeouts or memouts. You'll can help avoid hitting these limits by limiting the number of results (see below).

How to parse a file produced by a batch processing of sentences with ACE?

If you produced a file using ACE at the command line, then the easiest way to convert to other formats is using the convert command's --from ace option, which filters out the non-MRS outputs from ACE:

$ ace -g ~/grammars/erg-trunk/erg.dat <<< "Abrams barked." > x.mrs
$ delphin convert --from ace --to simpledmrs < x.mrs 
dmrs { [top=10002 index=10002 surface="Abrams barked."] 10000 [proper_q<0:6> x PERS=3 NUM=sg IND=+]; 10001 [named<0:6>("Abrams") x PERS=3 NUM=sg IND=+]; 10002 [_bark_v_1<7:14> e SF=prop TENSE=past MOOD=indicative PROG=- PERF=-]; 10000:RSTR/H -> 10001; 10002:ARG1/NEQ -> 10001;}

This is also available via the delphin.commands.convert() function, if you want to script it in Python.

Alternatively you can batch-process from within PyDelphin. If you have (or can make) a test suite, then TestSuite.process() works well. Otherwise there are several ways of using ACE on lists of items (the overview shows most of them).

Moreover, I didn't find how to pass additional parameters to ACE in the ace.parse function/method. Did I miss something in the documentation?

The documentation could be clearer (suggestions welcome). The module functions ace.parse() and ace.parse_from_iterable() instantiate an AceParser object, and any keyword arguments (the **kwargs in the documentation) given to the module functions are passed to the class's init method. See the AceProcess class for those keyword arguments. The one you want is cmdargs:

>>> response = ace.parse('erg.dat', sentence, cmdargs=['-n', '5'])

That tells the parser to only return 5 results.

Does this help?

goodmami commented 5 years ago

@arademaker Did you have any remaining questions? Otherwise I'll close this issue soon.

arademaker commented 5 years ago

Sorry for late reply. Yes, your instructions help me a lot.