Scope of the Library - Githubissues

unidesigner commented 7 years ago

Hello! Great to see the catpy library online - I think it will become useful to a lot of people. How do you see the scope of the library? Would you want to have it purely as an access library to the CATMAID API or do you envision also more sophisticated contributions for analysis (single neuron, circuits) and visualization?

If the latter is the case, I would be happy to contribute some of the things I wrote during my PhD (partly in the CAT package and elsewhere) and I believe this would be the right place to do so. However, it will be necessary to have a good basis data structure to represent skeletons, connectors, circuits including attributes on the nodes and edges, metadata etc. to build on top.

For my purposes, I built this myself, mostly based on NetworkX graphs or Numpy matrices, but it would make much more sense to have this standardized and ready-to-use for people getting started with novel analysis, and also to reduce duplication of programming efforts. I know there were talks about this earlier and I know about the repo by @ceesem (https://github.com/ceesem/catpy) also doing that. I am happy to contribute my insights to such a discussion if this is of interest here.

clbarnes commented 7 years ago

We had a bit chat about this the other day and it partially comes down to maintenance - as our user base is fairly small but the API can be a bit of a moving target, we thought that fleshing out every endpoint may not be worth the time so v1 is definitely going to focus on just accessing the data. There are still a couple of bits to do on that front - we're likely to add a CLI, @ceesem has been working on a JSON schema for exporting circuits, and I'd quite like to re-implement some of the export options which are currently only available through the frontend.

I may not be speaking for everyone here, but I think in the first pass we'll be prioritising what we need to to make it a useful base tool for people to build their analysis pipelines on top of, rather than an analysis suite in its own right. I can definitely envision some sort of module or extension system, though, for people to contribute their own pipelines.

aschampion commented 7 years ago

To echo what Chris said, v1 is definitely just about providing an easy, correct way for people to do data access and export from their scripts (and when I get around to it, the command line). Connecting to CATMAID and calling an endpoint should be 2 lines of code that are just a pip install catpy away.

The next milestone after that is providing ergonomic access to common APIs like annotation queries and skeletons, but without custom structured data on the client. More just something to expose the API in a pythonic way so that you don't have to go digging through the endpoint code in the CATMAID backend to figure out how to name or format a parameter. Similarly, I'll also be adding ndarray-friendly volumetric stack tile access. At this point the value proposition of the library is that it makes it easier to write something like CAT, CATMAID-blender, or Casey's original catpy.

After that, yes, we would like the scope to include common analyses like those in CAT/catpy. This is why it's catpy -- we want to reproduce Casey's library in a planned, packaged way that we can include in CATMAID CI to have some assurance we don't break it. This will require some well-thought-through basis of data structures. However, the point is we don't want to push that design into the lower levels of the API client; it should always be something optional that you can choose to use if it fits your needs, but doesn't force you to convert data structures just to use the basic API wrappers. We also don't want to bloat dependencies or maintenance burden too much.

Should note that even though I'm talking about v1 and milestones here this isn't some long-term roadmap. This is all pretty trivial and will happen over the next few weeks.

Also this is just my take. The point of having all of us write the library together was to find common ground, since we've all written our own versions for our own use cases many times over. So we're glad to have your input, too.

unidesigner commented 7 years ago

@clbarnes @aschampion I see your points and they make all sense to me. For the milestone providing ergonomic access to the common API, it will be useful to have maybe a few example scripts (in an examples folder) for a few use cases to get started. E.g. given an annotation tag, retrieve all the neuron IDs annotated with this tag, for those IDs, retrieve all skeletons and associated metadata such names, other annotation tags, skeleton node tags etc. (perhaps in separate queries), including all pre/post links to connectors. Then aggregate the connector IDs and pull all per/post neurons/skeletons IDs of those connectors not yet in the original set (i.e. pull the first-order network, could also be a single query). I'm happy to test the exposed, pythonic API access with this and some other use cases once it is ready and provide feedback.

In order to organize the pulled data locally, I used some nested dict that is JSON-serializable in the past. This is where the "well-thought-through basis of data structures" will come in, and it makes sense that you want to keep the API access completely independent of this structure. Once this has matured with the reproduction of Casey's library, I will be able to conform to this with a refactored CAT (sic!). To reduce confusion, I'd like to remove the CAT repo from the catmaid organization for now - I hope that's fine.

@aschampion Looking forward to using the ndarray-friendly volumetric stack tile access. I hope it will provide arbitrary x,y,z and volume extent specification and synchronous access :)

catmaid / catpy

Scope of the Library #3