Closed hthomas-czi closed 4 months ago
Currently this argument is passed right through to soma.ExperimentAxisQuery.to_anndata
, which has the argument column_names
.
Should this change be made over there?
I think it would be odd if the argument column_names
was deprecated here, but still used in somacore
. Especially if the new form of the argument isn't made available in somacore
.
Emanuele suggested @aaronwolen may have thoughts on whether a change in somacore is on the table here.
Personally, I'm not sure this change is worth creating inconsistency between the APIs. But do like this change overall.
@ivirshup @ebezzi @pablo-gar - I think this is a tricky design issue and the following are my thoughts. I would like to hear your views on this:
It seems rather confusing to me that we would allow column_names
, obs_column_names
, and var_column_names
and then write some logic to ensure mutual exclusion. The interface has not gained any clarity with this approach.
I would suggest we keep the function signature of get_anndata()
as it is right now or replace column_names
with obs_column_names
and var_column_names
in census
. If soma
also wants to adopt obs_column_names
and var_column_names
because the maintainers think it is an improvement then that would be fine but not a requirement.
Here is why:
I think we should distinguish between the public interface of census
and soma
. They don't necessarily have to be consistent because one is an application (census
- with application specific nouns and verbs to describe its functionality) built on top of another lower level API (soma
- which is a general data model supporting out-of-core processing with a query API).
For example, a census API method could wrap several soma
API methods to offer some functionality and in doing so can reasonably have function and argument names that makes sense for the census
application and doesn't exactly match soma
function signatures.
You could imagine other applications built on top of soma
that might define function names and arguments that communicate the nouns and verbs in the application better than soma
could.
The fact that census
does intentionally expose soma
functions does make API design tricky but you could view it as census
allowing you to drop down to lower level to get more fine grained control. For example, census
exposes get_obs() -> pd.Dataframe
, and if the user needs finer control can use obs.read()
to get an iterator over arrow tables and can then materialize it in memory if they wish. Similarly, get_anndata()
is a user friendly function for census users whereas soma.ExperimentAxisQuery.to_anndata
is lower level and more general.
I think we should avoid doing things like the above where we support both soma
type arguments and census
type arguments that effectively have the same meaning and exist alongside each other in the same census API method.
In general, I think we should put more weight on on what makes usage of the census
API easier rather than consistency with form of the underlying soma
API.
It seems rather confusing to me that we would allow column_names, obs_column_names, and var_column_names and then write some logic to ensure mutual exclusion. The interface has not gained any clarity with this approach.
I have update that code slightly so that now it either errors or throws a warning if column_names
is ever passed (I already had the test for it, but not the actual code). I think this should be more clear.
I think that argument is fair, but I guess I assumed there was some desire for users to access tiledbsoma for more complex use cases. Then it would be nice for them if the APIs are the same. It could also ease maintenance if the APIs don't drift much and many arguments can be passed straight through.
Separate the
get_anndata()
methodcolumn_names
argument intoobs_column_names
andvar_column_names
which accept an array instead of an object.Current
Proposed
edited by Pablo