eftsung / pygr

Automatically exported from code.google.com/p/pygr
0 stars 0 forks source link

incomplete dict method support for various pygr dict-like classes #6

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Pygr emphasizes using dict-like interfaces to database tables.  These
interfaces often subclass the built-in dict class, and "cache" objects
retrieved from the database using the built-in dict methods.  However, this
means that many of the standard dict methods actually only reflect data
that is in the cache, rather than the complete set of data in the database.
 This greatly diminishes the value of the dict-like interface!

For example, using seqdb.BlastDB as retrieved from pygr.Data:
What steps will reproduce the problem?
>>> import pygr.Data
>>> hg17 = pygr.Data.Bio.Seq.Genome.HUMAN.hg17()
>>> hg17.keys()
[]
>>> 'chr1' in hg17
False

This issue was raised in
http://groups.google.com/group/pygr-dev/t/80736ba41bc79739?hl=en

Different database classes in pygr have different levels of dict method
support.  All of them correctly support __getitem__() (and __setitem__() if
the interface is supposed to be writable), and __iter__().  Many of them
correctly support the additional iterator methods (i.e. keys(), items(),
iteritems(), values(), itervalues()).  __len__() should also be correctly
supported in most cases.  Less common operations like copy(), clear(),
update(), get(), setdefault(), pop() are not implemented.

We should do a survey of all dict-like classes in pygr and identify gaps in
the Mapping Protocol support.  Obvious points:

__contains__(), and __len__() must reflect the database, not the cache

__setitem__() and __delitem__(), update(), clear(), setdefault(), pop()
should raise exceptions if the database is not writable, rather than just
silently affecting the cache.

We should provide a standard method for clearing the cache, e.g.
clear_cache(), since clear() no longer would be available for that purpose.
 Control over actual memory usage is very important for working with large
datasets.

Note that caching guarantees an important property for Object-Relational
Mapping, namely that different requests for the same key are guaranteed to
return the same object.

Note that some of the Mapping Protocol methods imply instantiating all
items in the database: e.g. items(), values(), copy().  Pygr tries to
follow this logic, to give users a reasonably intuitive level of control
over whether data will be retrieved from the database on a row-by-row basis
vs. loading all rows via a single query.  Specifically, methods like
__iter__() and keys() do not themselves force loading of all rows from the
database, whereas methods like items() do.  The logic here is that by
calling items(), the user is declaring an intent to examine every single
row in the database, so this should be done with a single query to maximize
performance.

Original issue reported on code.google.com by cjlee...@gmail.com on 26 May 2008 at 9:39

GoogleCodeExporter commented 8 years ago

Original comment by cjlee...@gmail.com on 10 Sep 2008 at 8:14

GoogleCodeExporter commented 8 years ago
fixed for 
 * SequenceDB and subclasses
 * AnnotationDB and subclasses
 * SQLTable and subclasses
 * ForeignKeyGraph

Original comment by cjlee...@gmail.com on 11 Sep 2008 at 3:41

GoogleCodeExporter commented 8 years ago
Grr, now that we've completed re-implementing these classes using 
UserDict.DictMixin
I now see this added to the Python 2.6 docs:

Starting with Python version 2.6, it is recommended to use 
collections.MutableMapping
instead of DictMixin.

Terrific.

Original comment by cjlee...@gmail.com on 6 Jan 2009 at 11:14

GoogleCodeExporter commented 8 years ago
Nothing to worry about until DictMixin is deprecated and/or we decide to stop
supporting through 2.5!

Original comment by the.good...@gmail.com on 7 Jan 2009 at 12:13

GoogleCodeExporter commented 8 years ago

Original comment by mare...@gmail.com on 21 Feb 2009 at 1:28

GoogleCodeExporter commented 8 years ago
Hi Titus,
please verify the fix to this bug that you reported, and then change its status 
to
Closed.  We are now requiring that each fix be verified by someone other than 
the
developer who made the fix.

Thanks!

Chris

Original comment by cjlee...@gmail.com on 4 Mar 2009 at 8:51

GoogleCodeExporter commented 8 years ago

Original comment by mare...@gmail.com on 13 Mar 2009 at 12:52

GoogleCodeExporter commented 8 years ago
seqdb was already ok; looked over the others and they look good.

Original comment by the.good...@gmail.com on 22 Mar 2009 at 4:53