Closed susano0 closed 1 year ago
We should look at the other graph databases and have our naming strategy similar. I agree that we need to restrict what name is valid, especially for 'special' characters. I don't know about restricting a graph name to have an initial alphabetic character, though.
What naming strategy, if any, does Neo4j have?
Their naming convention seems similar to what @susano0 proposed. I was taught the same convention for naming variables in my introductory CS classes., so I'm assuming it's fairly popular.
Just be aware that there are naming conventions and actual restrictions. They may say it is good practice to do it this way, but still allow you to do it another.
This is a good introductory task. I would like for all of our current interns to work on this task together.
Below are the steps to follow -
Btw, others - non interns - are allowed to make productive comments and suggestions. We welcome everyone's input. The work for this task, however, is to be done by the interns collectively.
Started this task.
I think we can implement Cypher's exact naming rules and restrictions since we are using Cypher.
I am researching some open source implementation of Cypher to see how names are being validated. I found this list here containing many such projects. This project in particular seems relevant to me. I will be looking deeper in that project. Also, I would like to know what you guys think.
I noticed that the label names for the vertices and edges lack a naming scheme as well. I feel that's also something we should consider, although it can be less restrictive than the graph name naming convention.
I agree. Good catch!
If there isn't already an issue for each of those, could you create them, and link them here?
Created 2 separate issues for vertex (#334) and edge (#335) label naming conventions.
@jrgemignani As for the 3rd step of this task,
int validate_name(char* name)
function that returns whether the name
abides by the naming rules. create_graph
and rename_graph
functions in the graph_commands.c
file. I believe these are the only two places where a graph's name is set. @jrgemignani Also, a question. Do you think it is a good idea to use regex for name validation? I found a regex string and tested it with a list of label names. It seems to work fine.
Neo4j has these graph naming rules in addition to the standard naming rules mentioned earlier: https://neo4j.com/docs/cypher-manual/5/databases/#administration-databases-create-database (I assume graph is similar to a database?)
These rules are ignored when the name is escaped with backtick.
Regex should be okay to use. There is a regex PG library used by AGE for the ~= operator. If we use regex, we can use that library.
I believe that, in our case, graph is the same as database.
Lets go with the Neo4j naming rules, less the escaping part.
I think that for all names for labels, vertices, and edges, that we need to comply with Neo4j's standards.
Additionally, I think that it is a good idea to be a bit strict with valid names due to potential security issues by allowing escaped or control characters.
So, lets follow Neo4j but disallow escaped characters.
I implemented the naming rules and sent a pull request #349.
Here are the rules that I implemented. These rules are copied from Neo4j website. I added notes (in bold face) where I diverged slightly from Neo4j's convention. Please let me know if anything needs to be changed.
Naming rules for labels:
Naming rules for graphs:
Looks good @rafsun42! I agree with your changes and will review the pull request.
I've added some feedback to the PR.
Continuing the discussion here so as not to clutter the PR.
However, item 2, is due to potential security holes that allowing those characters could enable. For now, this restriction needs to stay in place.
I'm curious if these potential security holes are unique to AGE (because if not, it seems surprising that another platform intended for enterprise use would be willing to take such risks).
If it's a concern about a kind of injection attack: isn't that avoided by the constraint of enclosing these characters in backticks? (kinda like the use of CDATA in XML)
Totally possible that I'm missing something...
To address the issue regarding @TropicalPenguin's first point in the PR,
I researched and found a grammar specification at opencypher.org. It implements the syntax for labels this way:
UnescapedSymbolicName = IdentifierStart, { IdentifierPart } ;
(* Based on the unicode identifier and pattern syntax
* (http://www.unicode.org/reports/tr31/)
* And extended with a few characters.
*)IdentifierStart = ID_Start
| Pc
;
(* Based on the unicode identifier and pattern syntax
* (http://www.unicode.org/reports/tr31/)
* And extended with a few characters.
*)IdentifierPart = ID_Continue
| Sc
;
Here, ID_Start, ID_Continue, Pc, Sc are unicode character classes whose range you can see here. They seem to cover all unicode alphabets. The reference page provided in the grammar, describes these classes as:
ID_Start characters are derived from the Unicode General_Category of uppercase letters, lowercase letters, titlecase letters, modifier letters, other letters, letter numbers, plus Other_ID_Start, minus Pattern_Syntax and Pattern_White_Space code points.
In UnicodeSet notation: [\p{L}\p{Nl}\p{Other_ID_Start}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]
ID_Continue characters include ID_Start characters, plus characters having the Unicode General_Category of nonspacing marks, spacing combining marks, decimal number, connector punctuation, plus Other_ID_Continue, minus Pattern_Syntax and Pattern_White_Space code points.
In UnicodeSet notation: [\p{ID_Start}\p{Mn}\p{Mc}\p{Nd}\p{Pc}\p{Other_ID_Continue}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]
I tried using few characters both from the range and outside the range in Neo4J Aura. It seems they are following the above mentioned grammar. Unfortunately, in their documentation they don't mention the ranges explicitly.
I also want to mention that for database names they say they allow "Ascii alphabetic" character (does not mention non-english), however they actually allowed "unicode alphabetic" characters in Neo4J Aura.
So, I have two questions:
@TropicalPenguin Ty for pointing me back here. Too many places to look for this, cry. @rafsun42 yes, and yes. We can always adjust the rules going forward if necessary.
I found that there is no naming convention for creating a graph. Currently, we can create a new graph with names such as "." and "\n".
I suggest the following rules: