edsu / pymarc

process MARC records from Python
http://python.org/pypi/pymarc
Other
251 stars 99 forks source link

Make leader mutable? #108

Closed Wooble closed 4 years ago

Wooble commented 6 years ago

It would be really convenient to be able to mutate the leader somehow.

I find myself writing record.leader = record.leader[0:9] + 'a' + record.leader[10:] when converting MARC-8 to UTF-8 to work around the leader being an immutable string, which is pretty ugly.

(On the other hand, I'm not all that sure this could be done in a way that's backwards-compatible with things that expect record.leader to be a normal string. record.leader[9] = 'a' might be unrealistic to support, but perhaps we could add a method that flips a specific part of the leader?)

souzaluuk commented 5 years ago

Maybe go to the list type and use the python getter / setter methods to make it easier. But I do not know about the compatibility of the idea with the other versions of python.

edsu commented 5 years ago

A list would seem to make more sense. We could consider a new major version of pymarc in order to break backwards compatibility.

souzaluuk commented 5 years ago

I can try, so we can see the best way. What do you think?

Wooble commented 5 years ago

I think a list makes it harder to get at the pieces of the leader that use more than 1 character of the string.

I suspect a method would work better, although I don't have a good suggestion for a name or even the API.

MARC::Record solves my particular use case by having $record->encoding('UTF-8') set leader[9] to "a" and $record->encoding('MARC-8') set it to " ", although it's got nothing for changing other things in the leader besides the length/base address.

herrboyer commented 4 years ago

Hi, we're facing the same kind of issue right now, not blocking at all, but I was wondering about creating a maybe backward compatible Leader object like this one :

class Leader:
    """Mutable leader.

    Usage:
    leader = Leader("00475cas a2200169 i 4500")
    leader[0:4]  # 00475
    leader.status  # "c"
    leader.status = "a"
    leader[6] = "b"
    leader.record_type  # "b"
    str(leader)  # "00475abs a2200169 i 4500"
    """

    def __init__(self, value: str):
        self.value = value

    def __getitem__(self, item) -> str:
        """leader[:4] == leader.length."""
        if isinstance(item, slice) or isinstance(item, int):
            return self.value[item]
        return getattr(self, item)

    def __setitem__(self, item, val):
        """leader[5] works, leader[0:4] too, as well as leader.status = "a"."""
        if isinstance(item, slice):
            self._update_value(start=item.start - 1, stop=item.stop + 1, val=val)
        elif isinstance(item, int):
            self._update_value(start=item - 1, stop=item + 1, val=val)
        else:
            setattr(self, item, val)

    def __str__(self) -> str:
        return self.value

    def _update_value(self, start: int, stop: int, val: str):
        """Replace the chars between `start` & `stop` to `val` in `self.value`."""
        start = max(0, start)
        stop = min(LEADER_LEN, stop)
        if len(val) != stop - start:
            raise Exception("val length should match the replaced chars")
        self.value = self.value[:start] + val + self.value[stop:]

    @property
    def length(self) -> str:
        return self.value[:4]

    @property
    def status(self) -> str:
        return self.value[5]

    @status.setter
    def status(self, status: str):
        self._update_value(start=4, stop=5, val=status)

    @property
    def record_type(self) -> str:
        return self.value[6]

    @record_type.setter
    def record_type(self, record_type: str):
        self._update_value(start=6, stop=7, val=record_type)

The idea would be to use it in Record self.leader = Leader(leader[0:10] + '22' + leader[12:20] + '4500'). It should be almost backward compatible with the leader as a string and provide helpful getters & setters to manipulate it.

It's a bit too much on the magic side of python with its heavy use of dunders though...

I could keep poking around if you think it could be a good solution to this issue.

edsu commented 4 years ago

I really like this proposal @herrboyer. I'm curious to know what others think. Do you want to prepare a pull request?

Wooble commented 4 years ago

I sort of feel like just replacing the leader with a bytearray would be cleaner, although either way things are going to break :(

herrboyer commented 4 years ago

Thanks for your feedback.

We'll quickly see how things will go as soon as I find some time to make a PR.

edsu commented 4 years ago

@Wooble I actually don't see how much will break with @herrboyer's proposal. Does something pop out at you?

Wooble commented 4 years ago

My example workaround code in the original message in the issue, for one thing, although that one could be fixed by overriding __add__ and __radd__ on the Leader object. (Also it would be silly to be doing that anymore once the leader is mutable, but legacy code breaking is still annoying even if it's breaking because it's an ugly hack that is no longer necessary.)

Anything that just does a naive record.leader = 'some other leader' will give you a str instead of a Leader, which presumably will need to be handled on serialization or which will also fail. Also anything making the assumption that record.leader is a str object so it doesn't bother calling str() on it won't use __str__, and might break depending on what whoever's making that assumption does with it.

I'd be interested to see what breaks in the existing test suite just by simply dropping in a Leader and not changing anything else that uses record.leader. Probably no one should be doing most of the sorts of things with the leader that pymarc has to do internally, but it might point to any other gotchas.

edsu commented 4 years ago

Good points, thanks @Wooble. So whatever the change it will probably be a major version change if we can't guarantee backwards compatibility.

herrboyer commented 4 years ago

As you guessed @Wooble serialization was broken but easily fixed, let me now what you think of the PR.

herrboyer commented 4 years ago

record.leader = record.leader[0:9] + 'a' + record.leader[10:] will work as expected, except record.leader would not be a Leader object anymore, but a simple string, which probably won't matter much on legacy code, don't you think @Wooble ?

I tried to do something about it but overriding add/radd won't solve it, as a slice returns a string not the Leader itself.

Wooble commented 4 years ago

Oh, right, you'd have to make the slicing return a weird partial Leader object for that to work, and that would definitely be going too far...