Database assetstore - Githubissues

manthey commented 8 years ago

The database assetstore supports Mongo and sqlalchemy SQL types (provided dependencies are installed). For postgres, all non-volatile functions are allowed (as for db items). Each file in the assetstore references a specific table or collection. The download endpoint uses extraParameters to perform full selects on the database (with the same basic parameters as used by the item select endpoint). Default values for sort, fields, filters, limit, and format can be specified, and these can be overridden in the query.

To support range-requests, we preload the response to calculate its length. This is inefficient and a waste of memory.

The mongo database connector now caches field information. Since it inspects the entire collection to determine available fields, this is an expensive operation. This could still use improvement.

If _id is not explicitly included in a field list for Mongo, it is excluded, so as to only return the fields that were actually specified.

Tests are needed.

manthey commented 8 years ago

Our files do not have a known size (since it depends on the data in the database). I've stored a 0 for the size value, but this has some weird effects: the UI shows a size of 0, where it really shouldn't show anything. Range requests are ignored, as the cherrypy utility for calculating ranges is called with file['size']. Ideally, the file['size'] would have a special value (None?) indicating that it is a dynamic, unknown size, but some changes to Girder core will be required for this.

manthey commented 8 years ago

Currently, the database assetstore references a specific database (as well as the connection information for the database). One database assetstore can handle any number of tables, but you need a separate assetstore for each database. Instead, if the assetstore is just the database connector, then the files can reference both the table AND the database, and one assetstore can reference all databases and tables on that connector.

This will change the import function to list all databases and all tables in all databases (which will take longer to query). Databases will be added as folders, with the tables as currently done. This will allow top-level containers (collections and users) to be the appropriate parent for import as well.

aashish24 commented 8 years ago

@manthey looks like a great addition. Let's touch base on it some more tomorrow.

waxlamp commented 8 years ago

@manthey, did you and @aashish24 get a chance to talk about this? Did any new ideas or potential pitfalls with the current approach come up?

manthey commented 8 years ago

We did. Basically, I'll move the special item endpoints to be special file endpoints that will allow querying and changing the individual parameters stored in the files.

If the url for an assetstore includes the database, the behavior will be the same as currently on this branch. If you exclude the database in the assetstore url, then the import will list all databases and tables. If you pick a collection or user to add the tables to, it will create a folder with the name of the database.

On Mon, Jul 11, 2016 at 10:56 AM, Roni Choudhury notifications@github.com wrote:

@manthey https://github.com/manthey, did you and @aashish24 https://github.com/aashish24 get a chance to talk about this? Did any new ideas or potential pitfalls with the current approach come up?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenGeoscience/girder_db_items/pull/8#issuecomment-231759625, or mute the thread https://github.com/notifications/unsubscribe/AIX_R3Cd8CTH9LTxJRYRNnNuPmjUB-T8ks5qUlmigaJpZM4JEomK .

David Manthey R&D Engineer Kitware Inc. (518) 881-4439

waxlamp commented 8 years ago

Sounds great!

I forgot to mention, but @jeffbaumes had mentioned another reason multiple assetstores for a single logical database would be necessary: if the different tables had different permissions on them, you would need separate assetstores, each instantiated with different credentials.

Thanks for this work, it's very cool :sunglasses:

manthey commented 8 years ago

Left to do for this branch:

[x] Allow assetstores to not specify the database (and the import process to do so)
[x] Hide usernames and passwords in the display of the assetstore information
[x] Add more server tests (client tests will be a separate PR)
[x] Rename the plugin to database_assetstore
[x] Fix out-of-connections errors in tests. I think this is from not explicitly closing sqlalchemy sessions.

alex-r-bigelow commented 8 years ago

I realize this is still WIP, but in case it's useful, I'm getting this error from grunt (just tried it on a fresh girder installation):

Running "plugin:database_assetstore" (plugin) task
Verifying properties pluginDir, plugin.database_assetstore.tasks exist in config...ERROR
>> Unable to process task.
Warning: Required config property "plugin.database_assetstore.tasks" missing. Use --force to continue.

manthey commented 8 years ago

I've renamed the plugin to database_assetstore. This means that you have to name the plugin directory in Girder to database_assetstore rather than girder_db_items, or it throws this error. The plugin's Gruntfile.js could be changed to have both names in it, or I could make the grunt file figure out its local directory.

manthey commented 8 years ago

@ronichoudhury @aashish24 This PR is ready for review. There are server tests for all of the code, except for some of the MongoDB connector and one edge case in the sqlalchemy connector (98% coverage of server code). I'll add client tests next (in a separate PR, unless you feel it is necessary to approve this one or the review takes long enough).

manthey commented 8 years ago

Now working in Python 2.7 and 3.4.

alex-r-bigelow commented 8 years ago

It looks like the "plugin.database_assetstore.tasks" is missing problem has been fixed at this point. Thanks!

aashish24 commented 8 years ago

@manthey let's get it merged, I have some minor comments but we could talk about it when we meet next week.

zachmullen commented 8 years ago

Sorry for the delay, I'm QAing this branch now.

aashish24 commented 8 years ago

that's great @zachmullen thanks.

manthey commented 8 years ago

I've pushed a commit that should address your comments up to this point.

zachmullen commented 8 years ago

LGTM

OpenGeoscience / database_assetstore

Database assetstore #8