Closed rafsun42 closed 8 months ago
I was searching the get_label_name()
function and this idea of a junction table also works there. I would need to modify the function in 2 ways:
graphid
as argument it would take the entry_id
entry_id
to search the junction table and find the label_id
that the vertex belongs toAfter that the function will continue its flow as normal since the only difference is how we obtain the label_id
.
@rafsun42 Should I go along with these changes by making a temporary table via SQL commands to test my code?
Im in the process of a creating a junction table to hold just the id
and the label_id
of a given vertex. The table is created at the creation of the graph under the <graph_name>
schema.
I have successfully created the table with its columns but from what I understand every vertex that is inserted in the graph should also be inserted in the junction table, and Im having a little trouble with the inheritance system.
If I understand it correctly the inheritance system is the reason that when a vertex is added to a label table, its also added to the ag_label_vertex
table.
Therefore I believe that the right course of action is for the junction table to have the ag_label_vertex
table as a parent and then to alter the table through the postgres API to remove the unnecessary columns.
The problem with that is that I dont have access to the ag_label_vertex
table at the creation of the graph, so I can pass it as a list for the parents
argument to the function that creates the junction table. Is this the correct way to tackle this? How should i proceed?
@panosfol I am not sure if that's how postgres inheritance work. A row is not added to both child and parent table. Postgres has a documentation on it.
@rafsun42 According to the documentation it doesn't and I got a bit confused there. I can't pinpoint exactly how in the code the vertex is inserted in both the label_table and the ag_label_vertex
at the same time, I assumed it had to do with inheritance.
@panosfol Postgres' FROM ONLY
clause may give you some hint.
@rafsun42 ok thank you I will look into that!
@panosfol The documentation I was referring to before states:
Inheritance does not automatically propagate data from INSERT or COPY commands to other tables in the inheritance hierarchy.
That is the reason I was concerned if inheritance would work well or not.
So, we have to do this without using inheritance. Create the junction table independently with foreign key constraint on both columns. Insert into both vertex table and junction table.
Any pros and cons of not using inheritance?
@rafsun42 well a pretty obvious con is that we have to insert all the entries in the junction table so its going to double the time of inserting and updating. Other than that I don't see another issue.
The benefit is that it would work with changing the get_label_name
function (and maybe it would work better with other changes).
Ill push a commit fixing the creation of the junction table and then Ill look into how we can insert the vertices in the junction table also. Ill look into the create_vertex
function.
@rafsun42 So there are 2 issues that im facing now. First there isn't a function in the cache that takes just the OID and returns the graph_name, only the other way around. I have for now hard coded the graph name that Im using so I can test things.
The other issue is that the server crashes when I try to insert the tuple in the junction table. I went to the create_vertex
function in cypher_create.c:476
and using the function create_entity_result_rel_info()
I'm successfully creating the ResultRelInfo
for the junction table. After that Im duplicating the insert_entity_tuple
and Im using the same elemTupleSlot
and the ResultRelInfo
from the junction table. But the server crashes because the elemTupleSlot was created with 3 columns in mind and the junction table only has 2. I tried to copying the contents of the node->elemTupleSlot
to a new variable but I get segmentation fault. It seems that the tuple being created for that specific table is causing problems.
I will look into it more but it seems its a bit more complicated than I thought. Ill try to have it ready by tomorrow.
@rafsun42 I have found a Postgres function that returns the Namespacename given the OID so one problem is solved. But i dont think its the right direction to just insert the tuple twice with no prep, the tuple holds crucial info about the table that is being insered to, its not as simple as just choosing to insert it in another table.
@rafsun42 Ok so on further inspection the match clause also wont work without some changes because the patter is different if i create a new vertex for the junction table. I need to know how to move forward. Should we create the new vertex and make the necessary changes or try to just insert the tuple at the execution stage? Inserting the tuple by itself is complicated and Im not even sure if its doable to be honest
Approach 1 - Using a trimmed and indexed version of label table for join
The query
Cypher query that extracts label ID (see the QPT just below):
My goal is to replace the use of
_extract_label_id
in the Filter node. This line is filtering out edges where end node is not aTitle
.Building the solution that does not extract label ID
My solution is to create a trimmed and indexed table of Title table. Calling it
Title_hash
. It has only ID column and indexed by hash method.Query on the new solution
The SQL query that uses the new table: (It is equivalent to the above cypher query.)
Rationale
Because Title_hash has less data per row and it is indexed, joining it would be faster than joining with the original Title table. The cost of these query is almost similar.