delph-in / pydmrs

A library for manipulating DMRS structures
MIT License
14 stars 6 forks source link

Files #1

Closed guyemerson closed 8 years ago

guyemerson commented 8 years ago

Split code into different files (and add __init__.py)

guyemerson commented 8 years ago

I considered having a separate file for (de)serialization, but I realised this would create a cyclic dependency between the files...

goodmami commented 8 years ago

If you think this project will remain fairly small and focused, a single dmrs.py module might be easier for users to embed in their own applications (they can just copy the file around). If you expect the project to grow and you want to keep the codebase maintainable or understandable for users who only care about certain parts of the project, it's not a bad idea to split the code into separate modules. In this case, it's a good idea to try and maintain a consistent API, or at least note breaking changes in a CHANGELOG. For instance, if you currently support:

from dmrs import Node

Then if you move the Node class into a dmrs/components.py module, the above import would no longer work (it'd have to be from dmrs.components import Node). You can keep the original import valid by putting this import statement in dmrs/__init__.py, so the Node class is available from the dmrs package.

Regarding the circular imports, is this because you want to have a loads_xml() method on the Dmrs class, but the actual XML loading (done in a separate module) would need the Dmrs class for instantiation? There are some solutions to this:

  1. Import your serializer inside the loads_xml() (and related) methods. This way it happens after the main module is already loaded, and doesn't result in a circular import. This has a small performance penalty each time you call the method, though.
  2. Don't bind your serialization methods to the Dmrs class. This is the more functional, rather than object-oriented, approach. In this case, the main module doesn't import the serializer at all. Users would do something like d = serializer.loads_xml(...) instead of d = Dmrs.loads_xml(...).

Hope that helps

guyemerson commented 8 years ago

We were thinking that code for DMRS simplification might be worth separating, since this is not part of the core functionality. (This code hasn't been added yet.)

I've written the loads_xml function as a class method so that it's inherited by child classes - at the moment, we're considering different data structures. So for the second solution you mentioned, rather than Dmrs.loads_xml(f), it would be loads_xml(f, Dmrs). I wasn't sure if this was the right way to go, so I left it for the time being.

goodmami commented 8 years ago

If you want a single loader to be able to instantiate different kinds of DMRS data structures, then going with your solution of loads_xml(f, Dmrs) is a good idea, as long as all the different implementations of DMRS have the same methods for adding nodes, links, etc. You might make the second argument optional and default to some canonical representation, unless you want to force the user to make that decision.