IntelLabs / vdms

VDMS: Your Favorite Visual Data Management System
MIT License
84 stars 31 forks source link

Issues with FindEntity #87

Closed prashastk closed 5 years ago

prashastk commented 5 years ago

Added an entity successfully with this:

# Query
[{'AddEntity': {'class': 'Cell',
                'properties': {'cell_id.number': 2674,
                               'cell_id.timestamp_ms': 1544069566053,
                               'num_images': 2,
                               'run_start_date': {'_date': '2018-12-06T04:04:01'},
                               'sample_id': '1134'}}}]
# Response
[{'AddEntity': {'info': '', 'status': 0}}]

This FindEntity works:

[{'FindEntity': {'class': 'Cell',
                 'constraints': {'cell_id.number': ['==', 2674]},
                 'results': {'count': ''}}}]
[{'FindEntity': {'count': 1, 'status': 0}}] 

These FindEntity give unexpected errors:

  1. For some reason cell_id.timestamp_ms constrain doesn't work:
    
    # Query
    [{'FindEntity': {'class': 'Cell',
                 'constraints': {'cell_id.number': ['==', 2674],
                                 'cell_id.timestamp_ms': ['==', 1544069566053]},
                 'results': {'count': ''}}}]

Response

[{'FailedCommand': 'Transaction', 'info': 'Failed PMGDTransaction', 'status': -1}]

Additional server error

{ "info" : "InvalidID: ", "status" : -1 }


2. A missing property `abc` in the constraint throws an error:

Query

[{'FindEntity': {'class': 'Cell', 'constraints': {'abc': ['==', '1134'], 'cell_id.number': ['==', 2674]}, 'results': {'count': ''}}}]

Response

[{'FailedCommand': 'Transaction', 'info': 'Failed PMGDTransaction', 'status': -1}] []

Additional server error

{ "info" : "ReadOnly: ", "status" : -1 }


3. Count of 0 throws an error (cell_id.number 12321312 does not exist):

Query

[{'FindEntity': {'_ref': 1, 'class': 'Cell', 'constraints': {'cell_id.number': ['==', 12321312]}, 'results': {'count': ''}}}]

Response

[{'FailedCommand': 'Transaction', 'info': 'Failed PMGDTransaction', 'status': -1}] []

Additional server error

{ "info" : "Null search iterator\n", "status" : -1 }

luisremis commented 5 years ago

Thanks for this input!

For 1: If you compiled PMGD with its default values, string for the properties name must be 16 char or less. "cell_id.timestamp_ms" is larger than 16, and that is probably the reason why the Find is failing. I am confused on why the Add is working, we will add tests cases for this and find a fix.

2 and 3 are related to #36, we will add fix for this.

prashastk commented 5 years ago

For 1: Any specific reason to restrict properties to 16? Can we increase that to say 256?

luisremis commented 5 years ago

The name of the property is 16 char, the value can be of any length. This is because internally we have a string table for the name of the properties and classes for faster access/search. It can be increase at when compiling.

The rationale behind this being 16 char is that properties should be a very concise description of the information that the property has. But this is not the first time we get the request of making them larger :), so we are considering change that default to a larger value.

prashastk commented 5 years ago

Thanks. Looking forward to an increased larger default value.

vishakha041 commented 5 years ago

The increase will cause hash time to go up for all property keys and class names though. Our most common case is class names and property keys that are smaller than 16 characters. Do you have a lot of cases of larger string lengths?

prashastk commented 5 years ago

I would imagine the time spent hashing class names and property keys would be insignificant compared to the rest of the query. Would be worth doing some benchmark for it.

A size of 16 is a pretty strict constraint which limits the way we can construct our data schema. Something as innocuous as cell_id.timestamp is 17 character long. We are designing our schema using protobufs and without maps we need to normalize the data which will generate longer names. The class of the entity tells me which protobuf Message to use and cell_id.timestamp tells me that the value for the timestamp field inside the cell_id field.

vishakha041 commented 5 years ago

For 2: the code behaves as expected. Throws an exception when a wrong constraint is provided as a search criteria (ignoring the error message that is so generic). Would it be preferable in such a situation to ignore the missing property? I can fix it to do that or just change the error message and I am not quite sure what the preferred option might be.

prashastk commented 5 years ago

2: The term missing property is not clear to me. What if that property is missing for only a subset of entities in that class? Will the error be thrown only if the property is missing in all the entities of that class? A possible approach could be to return some warning in info: [{'FindEntity': {'count': 0, 'status': 0, 'info': 'constraints abc property not found'}}]

vishakha041 commented 5 years ago

This error happens if that property key was never seen before and you use it in some constraint. If it was added for some entities and not for others, that's ok. So the question is, would you rather that the constraint be ignored or it is ok to get the error? The message could definitely be made more indicative so it is clear what is wrong.

prashastk commented 5 years ago

An error in that case sounds good.

vishakha041 commented 5 years ago

For 1: would you say 64 is long enough string for property keys and class names?

prashastk commented 5 years ago

Yes, that should be sufficient.

vishakha041 commented 5 years ago

Partly fixed in 45f651f. The first part is not an issue anymore. The longer string id length is something for PMGD. Will create an issue there.

luisremis commented 5 years ago

reopening for the missing bugs here.

luisremis commented 5 years ago

my bad, all covered.