Open fingolfin opened 9 years ago
Could you by any chance get our notes up somehow? I'd like to push this package and get a usable version by the end of the next meeting.
Sure, and sorry for the delay :-(. Anyway, I wanted to clean this up, but it is not clear when I can dig myself out of teaching and other obligations enough. I thought I would find time for it today, but it again did not work out. Ah well. :-(.
So I'll just put my raw, unedited, horrible notes here. This also contains some other TODO notes, I'll just dump it all here.
* add benchmarks for various implementation of e.g. prio queues
* use autodoc, at least for the doc skeletons
* add an ordered dictionary, these are very useful for e.g. orbit computations.
see also
http://morepypy.blogspot.de/2015/01/faster-more-memory-efficient-and-more.html
(in fact, I begun work on implementing that)
Deque
pop_{back,front}
push_{back,front}
{back,front}
Heap
but what about "IncrementPriority" -> not for PQ, only Heap
but how to implement in reality, so that it is both usable and efficient?
what are use cases?
-> perhaps Dijkstra's shortest path algo
for this purpose, have a "MutableHeap", with ideas based on
http://theboostcpplibraries.com/boost.heap:
for these, Add returns a "handle" which can be used to later
on refer to that particular object, and to modify its key
HashTable
Add(ht, key, value), \[\]\:\=
put value in under this key, regardless of whether it is here
Update(ht, key, value)
-> update value, but throw error if key is not actually present
AddUnlessAlreadyThere;
this would be the third, and throw an error if the key is already prsent.
we can't think of a good name, though. but perhaps nobod needs it,
so let's leave it out for now
LookupWithDefault( ht, key, defaultValue )
Lookup(ht, key), \[\]
should be an error for consistency for arrays:
arr[5] gives an error if arr[5] is not bound
thus
hash["key"] should return an error if "key" is not in the hash.
if you really want it to return fail, you can do this via
LookupWithDefault( dict, fail )
count / find???
maybe
hasKey / contains -> \in
but
FUTURE: Syntax extension for GAP to create HASH tables easily, e.g.
{ key => value, ... }
or
{ key : value, ... }
or whatever...
Stack
push / pop vs. Add / Remove
IsEmpty
top / peek
Size
Queue
push / pop vs. Add / Remove
IsEmpty
front / peek vs. back / top
Size
AddFirst / RemoveFirst / First -> turn First into an operation
AddLast / RemoveLast / Last are aliases for:
Add / Remove / ????
add Last / LastOp to GAP !!
LinkedList ????
maybe a deque implementation could be based on that...
or on something more advance even...
OrderedSet
UnorderedSet
TODO
Q: Do we really need that, in addition to the Maps? They could in theory
be slightly more efficient
OrderedMap / Dict / Dictionary (TAKEN) / AssociativeArray / Map (ambiguous?)
UnorderedMap / Dict
Q: do we really need both ordered and unordered? In PyPy, they recently
decided to always use an OrderedMap, because it ended up being always
faster for them anyway.
http://morepypy.blogspot.de/2015/01/faster-more-memory-efficient-and-more.html
x := UnorderedMap();
x := UnorderedMap(IsString); # hint about key
x := UnorderedMap("key"); # hint about key
might return an AVLTree, or a HashTable, or...
PriorityQueue
might return a binary heap, or something else
use \< to compare elements, or a custom isLess function
alternatively, use a SortBy-like approach, with a function
that takes an entry, and maps it to another, and then compares THESE
with <
Add
Remove -> get element with highest priority acc
Peek / PeekMax / or just use "Maximum" (then Maximum needs to become an operation
but what about "IncrementPriority" -> not for PQ, only Heap
----
iteration interfaces for all of these...
AsList
AsSet
------
Methods:
- Add / Remove for everything?
- "Peek" for everything????
- IsEmpty for everything?
- Size?
- Capacity?
- Clear / Reset to clear arbitrary of our contains
- Resize / Reserve -- to reserve storage
-----
obsolete data structure implementations in the GAP kernel we should get rid of:
- lierep.gi / .gd: IsSearchTable
- NewDictionary
=======
provide a
WeakKeyDict
look at
http://docs.julialang.org/en/release-0.4/stdlib/collections/
One last remark: I wanted to turn First
into an operation so we could use it, but that PR is stalled and will not go in for now. So we may need a new plan. Or perhaps just get in a smaller change, which modifies the First
global function to dispatch to an operation in the single arg case?
@fingolfin mentioned yesterday the idea of an hashmap which also preserves the order in which elements were added. As I understood it this essentially a PLIST to which new elements were added at the end, combined with a hashtable storing indices into the PLIST.
A few thoughts about this:
The way to present this at GAP level is probably as a PLIST with Add but no assignment and a super-fast Position method (and consequently also a \in method). That very neatly meets the needs of orbit algorithms.
If you don't plan to delete much then this approach may always be correct. In the hash table you store the index of the key-value pair (if you have values) and some bits of the hash value of the key. Depending on the size of the hash table, you could get all of that into 32 or 64 bits per entry for all but the most enormous tables. With linear probing and Robin Hood hashing, for instance, you will basically need just one cache line from the hash table plus one from the PLIST plus whatever you have to do to compare entries for almost all lookups.
Making one of these would replace the common idiom of sorting a PLIST when you have finished making it and before you start doing a lot of Position or \in tests on it. Provided you can find a hash function, this is strictly better.
@stevelinton The idea I described comes from here https://morepypy.blogspot.co.uk/2015/01/faster-more-memory-efficient-and-more.html
I also created a separate issue to keep track of ideas and plans for the hash map/set/table stuff, you may want to move your comments there, so we can find them again more easily later on: https://github.com/gap-packages/datastructures/issues/30
We discussed various plans for this package in Trondheim. I have notes on them, and still should clean them up and put them somewhere (probably in the Wiki of this GitHub project).
I am filing this issue, to make sure I don't forget.