Closed manthey closed 8 years ago
Our files do not have a known size (since it depends on the data in the database). I've stored a 0 for the size value, but this has some weird effects: the UI shows a size of 0, where it really shouldn't show anything. Range requests are ignored, as the cherrypy utility for calculating ranges is called with file['size']. Ideally, the file['size'] would have a special value (None
?) indicating that it is a dynamic, unknown size, but some changes to Girder core will be required for this.
Currently, the database assetstore references a specific database (as well as the connection information for the database). One database assetstore can handle any number of tables, but you need a separate assetstore for each database. Instead, if the assetstore is just the database connector, then the files can reference both the table AND the database, and one assetstore can reference all databases and tables on that connector.
This will change the import function to list all databases and all tables in all databases (which will take longer to query). Databases will be added as folders, with the tables as currently done. This will allow top-level containers (collections and users) to be the appropriate parent for import as well.
@manthey looks like a great addition. Let's touch base on it some more tomorrow.
@manthey, did you and @aashish24 get a chance to talk about this? Did any new ideas or potential pitfalls with the current approach come up?
We did. Basically, I'll move the special item endpoints to be special file endpoints that will allow querying and changing the individual parameters stored in the files.
If the url for an assetstore includes the database, the behavior will be the same as currently on this branch. If you exclude the database in the assetstore url, then the import will list all databases and tables. If you pick a collection or user to add the tables to, it will create a folder with the name of the database.
On Mon, Jul 11, 2016 at 10:56 AM, Roni Choudhury notifications@github.com wrote:
@manthey https://github.com/manthey, did you and @aashish24 https://github.com/aashish24 get a chance to talk about this? Did any new ideas or potential pitfalls with the current approach come up?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenGeoscience/girder_db_items/pull/8#issuecomment-231759625, or mute the thread https://github.com/notifications/unsubscribe/AIX_R3Cd8CTH9LTxJRYRNnNuPmjUB-T8ks5qUlmigaJpZM4JEomK .
David Manthey R&D Engineer Kitware Inc. (518) 881-4439
Sounds great!
I forgot to mention, but @jeffbaumes had mentioned another reason multiple assetstores for a single logical database would be necessary: if the different tables had different permissions on them, you would need separate assetstores, each instantiated with different credentials.
Thanks for this work, it's very cool :sunglasses:
Left to do for this branch:
I realize this is still WIP, but in case it's useful, I'm getting this error from grunt (just tried it on a fresh girder installation):
Running "plugin:database_assetstore" (plugin) task
Verifying properties pluginDir, plugin.database_assetstore.tasks exist in config...ERROR
>> Unable to process task.
Warning: Required config property "plugin.database_assetstore.tasks" missing. Use --force to continue.
I've renamed the plugin to database_assetstore
. This means that you have to name the plugin directory in Girder to database_assetstore
rather than girder_db_items
, or it throws this error. The plugin's Gruntfile.js could be changed to have both names in it, or I could make the grunt file figure out its local directory.
@ronichoudhury @aashish24 This PR is ready for review. There are server tests for all of the code, except for some of the MongoDB connector and one edge case in the sqlalchemy connector (98% coverage of server code). I'll add client tests next (in a separate PR, unless you feel it is necessary to approve this one or the review takes long enough).
Now working in Python 2.7 and 3.4.
It looks like the "plugin.database_assetstore.tasks" is missing
problem has been fixed at this point. Thanks!
@manthey let's get it merged, I have some minor comments but we could talk about it when we meet next week.
Sorry for the delay, I'm QAing this branch now.
that's great @zachmullen thanks.
I've pushed a commit that should address your comments up to this point.
LGTM
The database assetstore supports Mongo and sqlalchemy SQL types (provided dependencies are installed). For postgres, all non-volatile functions are allowed (as for db items). Each file in the assetstore references a specific table or collection. The download endpoint uses extraParameters to perform full selects on the database (with the same basic parameters as used by the item select endpoint). Default values for sort, fields, filters, limit, and format can be specified, and these can be overridden in the query.
To support range-requests, we preload the response to calculate its length. This is inefficient and a waste of memory.
The mongo database connector now caches field information. Since it inspects the entire collection to determine available fields, this is an expensive operation. This could still use improvement.
If _id is not explicitly included in a field list for Mongo, it is excluded, so as to only return the fields that were actually specified.
Tests are needed.