docs: show how to iterate over intervals in order

sietse commented 2 years ago

Move information from issue #44 into README

sietse commented 2 years ago

Hullo Alexandre, Thank you for Portion, it is very useful, especially its IntervalDict.

How would you feel about explaining 'iterate over atomic intervals in order' in the main README, rather than linking to a issue? If you like that idea, this pull request implements that. Do feel free to adjust it to your liking.

Motivation: so far, every time I used an IntervalDict it represented values over time; and I found myself needing to iterate over the intervals in temporal order. One time it was to run a simulation; another time it was to feed an external program that expected an ordered list of (start, stop, value); a third time I needed to generate the values for a plot. 3 different use cases seems common and varied enough that I propose moving this tip into the main README.

Open questions:

Should I put this advice in the .items() docstring, too? At the REPL, the docstrings are more accessible than the README.
Perhaps it's even worth creating a dedicated method for this -- my_interval_dict.atomic_items()? I'd be willing to work on that, too. But I figured I'd start small, and see how you feel about this docs change.

Kind regards, Sietse

AlexandreDecan commented 2 years ago

Hello,

First of all, thank you for your interest in portion. I'm always happy to see this library being used, and potentially useful :-)

Concerning the PR, I am rather divided. I understand the interest in mentioning the possibility to get atomic intervals from an IntervalDict but I think that the "documentation" (read: README, in the absence of a real documentation) should focus on the features that are directly offered by portion and not stray too much on the "features" that follow from it.

In fact, this PR reminds me that there is still work to be done with portion, both in the code and, more importantly, in its documentation. I think the library would benefit from having a "real documentation" (and not just a README) in which these "advanced uses" could be concretely proposed and described. An alternative would be to offer these various "features" in the form of a contrib module with some of the usual routines (not only the example discussed in this PR, but also functions such as .extend or conversion to discrete intervals, etc. as mentioned in several issues so far). Another alternative would be to add a "Common recipes" section in the README to list these various examples.

Going back to the topic of this PR, as it stands, I am rather in favour of not changing the README to include this example (in order to keep the README "simple") and to keep the pointer to the comment explaining how to achieve this result. This seems to me to balance the best of both worlds: the README doesn't get "bloated" while still providing the information that it is possible to get the (sorted list of) atomic intervals from an IntervalDict.

Of the open questions you mention, the second point could possibly be implemented directly within the .as_dict function, e.g. in the form of an optional atomic parameter (False by default) which, if set to True, converts the dictionary keys into atomic ranges. If we go that direction, then obviously we will have to provide an example and to explain it in the README. What do you think about this?

sietse commented 2 years ago

Hello,

Just a brief reply for now, I'll try to write a longer reply sometime this week. Just wanted you to know that I've read your message and I haven't disappeared yet.

re: IntervalDict.as_dict(atomic=True) — the idea of allowing the user to choose 'atomic or not' when converting seems user-friendly, I think this is a good direction. (Default False for backwards-compatibility, as you say.)

I propose adding the same option to methods supporting iteration: .items(atomic=True) and .keys(atomic=True). Their docstrings already promise "Return a view object on the contained items sorted by key" (emphasis mine), so by adding the 'atomic' option we immediately cover the atomic+sorted usecase. The resulting code is nice:

timeline = P.IntervalDict()

... fill it with data ....

for interval, value in timeline.items(atomic=True): ...

.as_dict(atomic=True) is useful for users who want a collection. But because Python dicts are FIFO-ordered but not sorted, sorted iteration on dicts is noisier than with IntervalDict above:

timeline = P.IntervalDict()

... fill it with data ....

for interval, value in sorted(timeline.as_dict(atomic=True).items()): ...

So giving IntervalDict .as_dict(atomic=True), .items(atomic=True), and .values(atomic=True) looks like it's small in scope and covers a lot of usecases, what do you think?
Regarding docs: don't sell yourself short! The docs may be all in the README, but that's just a place -- contentwise, the documentation is excellent, very complete. I'd say it is very 'real' indeed :D

Cheers, Sietse

AlexandreDecan commented 2 years ago

Hello,

Thanks for your answer! Having an optional atomic parameter for .items and .keys is not going to be easy, since these methods are expected to return views (i.e., they should automatically update the output according to the current content of an IntervalDict). I don't see how we can return atomic intervals while keeping this contract.

For .as_dict, I believe we can sort the items directly or we can provide an additional parameter sort=False (I'm not really in favour of these options, though). I think the most "difficult thing" for someone using an IntervalDict is to get a list of atomic intervals, not really to sort them (as you mentioned, it's fairly easy to pass the output of .as_dict to sorted and since it's unlikely we can sort items faster in the .as_dict method, I don't think we should sort them in that method ;-))

AlexandreDecan commented 2 years ago

I'm closing this PR since we're moving towards a change in the code (and in the documentation) rather than a "simple" adaptation of the documentation. I propose to create a new issue so we can discuss the best way to address the underlying problem :-)

Thank you!

sietse commented 2 years ago

Salut Alexandre, veuillez excuser ma réponse en retard.

I've read the dictview docs, and studied the standard library's OrderedDict's .keys() implementation [1, 2], and it seems like a dictview does very little indeed. It supports __contains__ and __len__, which are 'live views'; and it supports __iter__, but the resulting iterator is no longer a live view.

This is what a dictview does NOT have to implement (get myview from the big code block below):

You can't index into a dictview: myview[1] raises TypeError: 'myview' object is not subscriptable
It is not itself an iterator: next(myview) raises TypeError: 'odict_values' object is not an iterator

An iterator over a dictview does not have to live-update. It may simply raise a RuntimeError if the underlying dict is concurrently mutated:

# Get an OrderedDict and iterate over its items view
>>> from collections import OrderedDict
>>> odict = OrderedDict(a=1, b=2, c=3)
>>> myview = odict.keys()
>>> myiter = iter(myview)

>>> next(myiter)            # Iteration works fine if you leave the dict alone ...
'a'
>>> odict.move_to_end('b')  # ... but mutating the dict, even ahead of the iterator ...
>>> next(myiter)            # ... invalidates the iterator. So the iterator is not a live view.
RuntimeError: OrderedDict mutated during iteration

sietse commented 2 years ago

Ha, crossed messages! You're right, this is more of a code discussion now.

I'll wait for you to open the issue, and if my previous message is still relevant I'll copy the relevant bits over. No need to open an issue now-now, by the way, just whenever your schedule suits you.

AlexandreDecan commented 2 years ago

:+1:

AlexandreDecan commented 2 years ago

I'm sorry for not having published any update about this PR. I'm running a bit low on time currently ;-) I've added the atomic=False parameter to the as_dict method (as well as some very naive support for structural pattern matching). The code is not yet released (it's available from the master branch, but not (yet) turned into a release on PyPI). I'll try to find some time to see how we can adapt the other methods of an IntervalDict (and see how we can efficiently pre-sort, if possible, the atomic intervals returned by the as_dict method).

AlexandreDecan / portion

docs: show how to iterate over intervals in order #71

... fill it with data ....

... fill it with data ....