chrisjbryant / errant

ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.
MIT License
436 stars 107 forks source link

Open/close out_m2 correctly in parallel_to_m2.py #20

Closed Giovanni-Alzetta closed 4 years ago

Giovanni-Alzetta commented 4 years ago

Currently out_m2 file is opened and not closed, which might lead to weird bugs. Opening with with should handle everything.

A side note: I would like to wrap the main functions from parallel_to_m2.py and compare_m2.py in separate functions which are then called from the respective main.

In this way they can be run both as a script (as currently suggested in the readme), or loaded as a function from an external python script. Would you be okay with such a pull request?

chrisjbryant commented 4 years ago

Heya, Sorry for the delay - I'll add the with statement to my list of minor upgrades to errant for the next version. Thanks for catching it!

I'm not sure I follow your other request. What kind of functionality are you looking for that ERRANT doesn't already do?

Giovanni-Alzetta commented 4 years ago

No problem for the delay.

I'd like to add more flexibility: currently, in order to use errant, one has to call the functions from bash.

Currently in parallel_to_m2.py you're doing something like

def main():
  # Do stuff to transform parallel text into m2

I'd like to split it like this:

def transform_parallel_to_m2(....):
  # Do stuff to transform parallel text into m2

def main():
  transform_parallel_to_m2(...)

In this way the scripts will work as it does now.

Moreover, one could use something like from errant import transform_parallel_to_m2 in his/her code, and call the evaluation function, without passing through bash.

(In order to do this you must use the with correctly, or files won't be closed correctly...also, I have the code ready, if you tell me where to make the PR I'll gladly make the contribution)

chrisjbryant commented 4 years ago

Aha I think I see. I think this sounds more like a particular use-case you have in mind though rather than something I'd want to push to the main branch.

  1. The bash commands are there just to make it easy to run the "standard" expected setup.
  2. If you want more control, you can always use the API modelled on parallel_to_m2 (or the Quickstart in the readme) to create a simple wrapper function.

Consequently, I think the API already does what you suggest and is more flexible; e.g. it doesn't expect a particular input or output file format.