If BHL doesn't provide a proper index to each page(?) it probably makes sense to add our own to get a level of abstraction. A 32 bit unsigned number should be plenty (gives 84x space for growth)
Would be nice to save annotations/calculations others performed on the dataset and potentially make them accessible
Providing an interface for this might be a bit difficult, most likely would be constrained to a few different types of data. Anything fancy or large would probably require direct collaboration with SFG
Possibly a good way to tie the mining and viz systems together for a more comprehensive system
Automation of updates. Is it possible to make sure new or updated pages get processed to keep the whole system up to date? (I'm sure there are some analyses that are interdependent or would ideally be performed on the whole dataset instead on just new pieces) Not many projects support this kind of automation, depends on if any of it can be done with "spare" cycles.
I believe lots of data scientists want testing and validation data, would be nice to provide an interface to get a random sampling
The ability to query data via some API, and the ability to get results in a stream