delph-in / pydmrs

A library for manipulating DMRS structures
MIT License
14 stars 6 forks source link

Change sort function for SortDictDmrs. #18

Closed emm68 closed 8 years ago

emm68 commented 8 years ago

We discussed this during a meeting but I'm not sure how to do it neatly with a more complicated sorting function. The sorting I wanted to use and which I think is pretty general is sorted by: 1) cfrom, 2) cto (decreasing), 3) pred str(). Do we perhaps want to make the sorting function a user specified parameter? If so, do you know any simple way of how to account for more complicated cases like this, where bisect doesn't quite cut it?

guyemerson commented 8 years ago

I think bisect still cuts it. At the moment, we use the list _nodeids to keep track of the order. Instead, we can have a list of anything else to use for comparison.

So we can allow the user to specify two functions, one mapping from nodes to keys, and the other from links to keys. The current implementation of SortDictDmrs would just have attrgetter('nodeid') for nodes, and a simple identity function for links.

We can explicitly provide sorting based on nodeids and based on your suggestion, but doing it like this would allow any kind of sorting in general.

I'm happy to implement this if my sketch isn't clear.

guyemerson commented 8 years ago

Okay, I've done this. If you initialise a SortDictDmrs with node_key=span_pred_key, this should work for you. I'll test it properly tomorrow, but I just thought I'd push this now.

emm68 commented 8 years ago

Wow, that was super fast, I didn't even get a second look at it after you commented. Thanks!

guyemerson commented 8 years ago

I think this all works. Let me know if you have any issues.

emm68 commented 8 years ago

There is a problem with SortDictDmrs in general. EQ links have no rargname and None is unorderable. Would it be a reasonable fix to change their rargname to empty string instead of None?

File "/pydmrs/pydmrs/core.py", line 900, in add_link i = bisect.bisect_right(self._link_keys, key) TypeError: unorderable types: NoneType() < str()

guyemerson commented 8 years ago

Ah, I hadn't tested that... but I think that's the right fix. So we just have to tweak the two lambda expressions.

(In the longer run, we need to deal with EQ links better, but this should work for now.)

emm68 commented 8 years ago

What I suggested was to change the code for loading dmrs from xml (lines 64, 65 in serial.py). Did you mean somehow tweaking the lambda functions used as link_key in SortDictDmrs?

guyemerson commented 8 years ago

I'm hesitant to change how we deal with EQ links just for sorting them.

I've changed the lambda functions in SortDictDmrs.

guyemerson commented 8 years ago

Following our discussion earlier, I've added a function to construct something you can pass to loads_xml. You can use it like this:

from pydmrs.core import abstractSortDictDmrs, span_pred_key
from pydrms.serial import loads_xml
dmrsFactory = abstractSortDictDmrs(span_pred_key)
dmrs1 = dmrsFactory()
dmrs2 = loads_xml(b'<dmrs/>', cls=dmrsFactory)
guyemerson commented 8 years ago

I've now also fixed it so that a SortDictDmrs instance with custom key functions can call loads_xml and return a new SortDictDmrs instance with the same key functions. The code is a bit hacky, since it defines a loads_xml attribute rather than a method... but I think this use case is probably rare enough that we don't need to do something fancier (like this).