VH-Lab / DID-matlab

Data Interface Database
Other
1 stars 1 forks source link

sql database construction note #30

Closed stevevanhooser closed 2 years ago

stevevanhooser commented 2 years ago

Hi Yair -

I noticed a use case that might impact how you build the database.

Consider the search operation 'isa':

stim_pres = S.database_search(ndi.query('','isa','stimulus_presentation',''))

which produces for this case I am working on all documents that are of type 'stimulus_presentation':

stim_pres =
  1×11 cell array
  Columns 1 through 5
    {1×1 ndi.document}    {1×1 ndi.document}    {1×1 ndi.document}    {1×1 ndi.document}    {1×1 ndi.document}
  Columns 6 through 10
    {1×1 ndi.document}    {1×1 ndi.document}    {1×1 ndi.document}    {1×1 ndi.document}    {1×1 ndi.document}
  Column 11
    {1×1 ndi.document}

To make this easy in the sql, one might want to have an enumerated table of all document classes, and an array for each document in the document table that lists the classes to which the document belongs (superclasses plus its own class), so that the query can run quickly.

Hopefully this make sense...

Thanks Steve

altmany commented 2 years ago

I believe that this is in theory already supported by the existing DB structure, since your 'isa','stimulus_presentation' query can be translated into something like this:

 SELECT docs.doc_id
 FROM docs, doc_data, fields
 WHERE docs.doc_idx = doc_data.doc_idx
   AND fields.field_idx = doc_data.field_idx
   AND (
       (fields.field_name = "meta.class"      AND doc_data.value = "stimulus_presentation") OR
       (fields.field_name = "meta.superclass" AND doc_data.value like "%stimulus_presentation%")
   )

I'm still not sure how you wish to convert the returned doc_id strings into document objects, if we don't store the doc objects in the database, nor any location pointer/reference to a disk file. Perhaps we should (at the very least) store a reference to the object. Let me know if we need to discuss this further or if you have a ready solution.

stevevanhooser commented 2 years ago

Hi Yair -

We should store the original JSON in a document column, and that is what is actually returned when the document is read. (That’s the only thing that needs to be read out. The database is just for search!)

Thanks Steve

On Jun 10, 2022, at 9:28 AM, Yair Altman @.***> wrote:

I believe that this is in theory already supported by the existing DB structure, since your 'isa','stimulus_presentation' query can be translated into something like this:

SELECT docs.doc_id FROM docs, doc_data, fields WHERE docs.doc_idx = doc_data.doc_idx AND fields.field_idx = doc_data.field_idx AND ( (fields.field_name = "meta.class" AND doc_data.value = "stimulus_presentation") OR (fields.field_name = "meta.superclass" AND doc_data.value like "%stimulus_presentation%") ) I'm still not sure how you wish to convert the returned doc_id strings into document objects, if we don't store the doc objects in the database, nor any location pointer/reference to a disk file. Perhaps we should (at the very least) store a reference to the object. Let me know if we need to discuss this further or if you have a ready solution.

— Reply to this email directly, view it on GitHub https://github.com/VH-Lab/DID-matlab/issues/30#issuecomment-1152362542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOPDNYLAOKV444RKJB3DTVOM7IXANCNFSM5YLBDKQA. You are receiving this because you authored the thread.

altmany commented 2 years ago

In this case, instead of the SELECT query returning docs.doc_id, it would return docs.json_code - the rest of the SQL query remains unchanged

altmany commented 2 years ago

How should the JSON field (docs.json_code) be populated? by which class method and with which value?

stevevanhooser commented 2 years ago

It should be the output of jsonencode on the document_properties field of the did_object / ndi_object. We run the version with NaN allowed (JSON natively does not support NaN but many readers and writers do, including Matlab). We have a wrapper function vlt.data.jsonencodenan.m that catches Matlab’s changing behavior over the last few years:

https://github.com/VH-Lab/vhlab-toolbox-matlab/blob/master/%2Bvlt/%2Bdata/jsonencodenan.m

On Jun 12, 2022, at 4:02 PM, Yair Altman @.***> wrote:

How should the JSON field (docs.json_code) be populated? by which class method and with which value?

— Reply to this email directly, view it on GitHub https://github.com/VH-Lab/DID-matlab/issues/30#issuecomment-1153278983, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOPDNRARAH5CIUPQC2F43VOY65VANCNFSM5YLBDKQA. You are receiving this because you authored the thread.

altmany commented 2 years ago

Initial version of did.implementations.sqlitedb was just committed. This includes a search() method that accepts a did.query object.

stevevanhooser commented 2 years ago

Great, thanks! Do you want to proceed to handling the other search cases first or to have me test the code that is here first? It would be helpful to have some test code to evaluate in +did/+test

Thanks! Steve

altmany commented 2 years ago

All done & committed as discussed earlier today. I decided to have the search function still return doc_ids (not doc objects) in a cell-array; a new public method (doc_ids_to_objects) now returns an array of did.document objects that correspond to the specified doc_id(s).