Open ivirshup opened 1 month ago
@ivirshup @pablo-gar @ebezzi - If you trace the code down, I believe you end up calling obs.read()
somewhere and that is the read()
method on DataFrame
class in SOMA
. In that class, the docs for read()
say that _"Slices are doubly inclusive"_
So from the specification I don't think this is a bug but I understand that it certainly seems like one from the user's perspective and we should investigate why slices are specified as doubly inclusive and violates the pythonic meaning of slices where the right endpoint is not included.
I think there is probably a good reason for it but the justification for the design choice is not written anywhere (at least I can't see it) and therefore we will need to dig in and derive it
@ivirshup in you opinion is dissonance is large enough to warrant a change in the of the parameter design?
Yes.
Is this the behaviour of tiledb or is it specific to tiledb-soma? (update: AFAICT this is only tiledb-soma)
I think it is a tiledb constraint.
Unlike NumPy array indexing, multi_index respects TileDB’s range semantics: slice ranges are inclusive of the start- and end-point, and negative ranges do not wrap around (because a TileDB dimensions may have a negative domain)
It is worth asking the tiledb team for a more detailed explanation for using this semantics of slices (I think most languages use the open-right-end-point semantics).
Describe the bug
There are a number of tests that look like:
However, this goes against normal python interpretation of slices. For instance:
get_anndata
, and presumably other methods, seem to be giving an extra point back when using aslice
for the{obs,var}_coords
arguments.Environment
Provide a description of your system and the software versions.
pip list
sessionInfo()