datajoint / datajoint-matlab

Relational data pipelines for the science lab
MIT License
42 stars 39 forks source link

Add support for Unions #89

Open dimitri-yatsenko opened 7 years ago

dimitri-yatsenko commented 7 years ago

See https://github.com/datajoint/wiki/wiki/Union

SchwartzNU commented 3 years ago

Why has this issue been pushed back to release 4.0? I'm assuming that will be a long way in the future. It's not really a functional SQL wrapper without union being implemented. Can you suggest a workaround?

ixcat commented 3 years ago

Hello -

Currently the release process is prioritized based on our assessment of needs in developing data analysis workflows / pipelines which is the main focus of DataJoint - 3.4 contained external blob storage and we did not want to defer the release of external storage for the implementation of unions. While unions are indeed very useful and desired (which is why we have committed to adding them), for the base case of data analysis / workflow pipelines they are often not needed. For the moment, the best workaround is to combine data results in the client.

Also, have adjusted the repository description text to match datajoint-python - from High-level relational database interface for MATLAB to Relational data pipelines for the science lab to make it more clear that the intent is not to be an RDBMS interface but a pipeline/workflow system using the relational model that happens to use an SQL backend.

SchwartzNU commented 3 years ago

Thank you. These changes definitely help. I am building a DataJoint-MATLAB system for my lab's workflow that we intent to start using in the new year. So I will likely continue posting here as I run into problems that I can't figure out. So far so good though! I don't know if you have a place for user-contributed code, but I built a general-purpose GUI query builder as a MATLAB app that other might find useful.

SchwartzNU commented 1 year ago

Any more updates on support of Union? This would REALLY be helpful for my application. I don't really know of a workaround without fetching the tables and then concatenting them in MATLAB, which is slow and then takes away the functionality of the database table. Thanks, -Greg

dimitri-yatsenko commented 1 year ago

I would like to learn more about your use of union. We find that in most cases, queries are slower than alternative queries producing the same result.

SchwartzNU commented 1 year ago

I have a large e-phys dataset in my lab collected over many years. While there are common protocols used in these experiments, there has also been some drift in their parameters. My student, Zach, and I devised a way to deal with this by automatically creating new tables for parameter sets that are incompatible with existing tables. For example, our schema now contains tables named...

ExperimentProtSpotsMultiSizeV1bp ExperimentProtSpotsMultiSizeV2bp ...

This allowed us to successfully load all the old data. Now I was hoping to write a wrapper function that would project out unnecessary parameters, set some that do not exist to NULL, etc, so that I can combine these into a single table.

Here is a toy example of what I mean:

Table A::
epoch_id
---
intensity
curr_spot_size

Table B::
epoch_id
---
intensity
curr_spot_size
spot_shape

Target table A + B::
epoch_id
---
intensity
curr_spot_size
spot_shape # NULL for A

I know that I can loop through the sub-tables and fetch out the parameters I need individually as a workaround, but that is rather slow and inelegant. It would also prevent me from writing a query that searches for a common parameter across the sub-tables. Collectively, these sub-tables have >4K entries, so I am really trying to avoid a client-side fetch and concatenation.

Thanks for your help and your quick reply.

SchwartzNU commented 1 year ago

Any more thoughts on this? We really need this functionality, so our next planned step is to re-implement dj.U and the + operator in MATLAB based on your python code.