cozodb / cozo

A transactional, relational-graph-vector database that uses Datalog for query. The hippocampus for AI!
https://cozodb.org
Mozilla Public License 2.0
3.44k stars 108 forks source link

Cannot include `#` character in a "raw string" #234

Open aramallo opened 9 months ago

aramallo commented 9 months ago

The presence of a single # character in any raw string will casue the query parser to fail.

The following example fails with reason The query parser has encountered unexpected input / end of input at 17..17

?[data] <- [[ ___"#"___]]

Just removing the # char makes it work.

I am using Cozo Rust library version 0.7.5

aramallo commented 9 months ago

While doing more tests I found that adding a newline after the hash character avoids the failure which suggests the issue is with the parser confusing this with a LINE_COMMENT?

So, this fails:

?[data] <- [[ ___"#"___]]

But this doesn't, yet it returns an empty string as a result which validates my assumption:

?[data] <- [[ ___"#\n"___]]
aramallo commented 9 months ago

Using https://pest.rs I tried validating my assumption but according to the latest pest file even adding a newline should fail.

The case of an empty raw string

image

Now when adding #

image

Adding the newline does not change the tool result

image

Hope this helps. Unfortuntely I am not very good with Rust yet and not familiar with pest at all to find a solution to contribute.

aramallo commented 9 months ago

So adding SOI to the LINE_COMMENT ruls solves the problem (but breaks LINE COMMENTS), which means we are on the right track.

LINE_COMMENT = _{ SOI ~ "#" ~ (!"\n" ~ ANY)* }

image
aramallo commented 9 months ago

I've extracted the related rules into a fiddle that shows how this fails.

aramallo commented 9 months ago

So I managed to fix the issue at the PEG level. The change consists in making the raw_string_inner pest rule atomic so that we can avoid the LINE_COMMENT having precedence over raw_string when # is present.

A fiddle here showing that it works.

I made the change in my fork. However, when I am pulling it from another project (my cozo binding for Erlang) , I still get the same error when running ?[data] <- [[___"#"___]]

I check Rust is compiling my fork and latest commit as shown below

Updating git repository `https://github.com/aramallo/cozo.git`
 Updating git submodule `https://github.com/facebook/rocksdb.git`
 Compiling cozorocks v0.1.7 (https://github.com/aramallo/cozo.git?branch=main#5d252699) <<<<<<<<
 Compiling cozo v0.7.6 (https://github.com/aramallo/cozo.git?branch=main#5d252699) <<<<<<<<

Could it be the case that the pest file has not produced any change on the parser? I am new to RUST and pest so not sure if I need to run something to generate the Rust parser and then commit that file or if pest is doing this when compiling automatically?

@zh217 Any ideas here?

andrewbaxter commented 8 months ago

Sorry, I'm not entirely sure, but that's included here: https://github.com/cozodb/cozo/blob/8b1b60cbf64f2b0ed2a14078cbd0c7838727df2a/cozo-core/src/parse/mod.rs#L39

#[derive(pest_derive::Parser)]
#[grammar = "cozoscript.pest"]
pub(crate) struct CozoScriptParser;

It's a derive macro, which gets automatically run during normal complication. It looks like pest_derive also accounts for external files changing (per https://github.com/pest-parser/pest/issues/789). So basically there should be no extra work required aside from changing that file.

And that log looks pretty clear, but you might be able to use cargo tree -i to confirm which version of cozo are being pulled in in the dependent project.

(And thanks to this issue for teaching me that cozo supports comments! It doesn't appear to be documented when I looked)

creatorrr commented 7 months ago

The only solution in the meantime is to pass the values separately and not interpolate anything. But still, this needs fixing.