Closed hydrolarus closed 3 years ago
This is interesting, thanks for the really clean implementation. I just added some docs. I have a question about your code - specifically, I understand why the output relations shouldn't have lifetimes if the inputs don't. However, the same for intermediate relations seems less clear since they're not being returned. Do you think something like the following can be ok?
use crepe::crepe;
use maplit::hashset;
crepe! {
@input
struct Word(String);
struct Suffix<'a>(&'a str);
@output
struct SuffixAndPrefix(String);
Suffix(&s) <- Word(s);
Suffix(&s[1..]) <- Suffix(s), (!s.is_empty());
SuffixAndPrefix(s.to_string()) <-
Suffix(s),
Word(w),
(w.starts_with(s));
}
fn main() {
let mut runtime = Crepe::new();
runtime.extend(&[Word("abracadabra")]);
let (output,) = runtime.run();
let expected = hashset![
SuffixAndPrefix("a".to_string()),
SuffixAndPrefix("abra".to_string()),
SuffixAndPrefix("abracadabra".to_string()),
];
assert_eq!(output, expected);
}
Thanks for the feedback!
I would have expected that intermediate relations could refer to the input relations as well, but currently the input relations are stored in a regular HashSet
that is also mutated. The mutation only happens once at the start to fill in the input data from the Crepe
struct, but the HashSet
itself still has the .extend(&updates)
part in the code. That prevents intermediates from creating and holding onto references from input relations.
If the input relations are handled in a different way then this could work I think! So if the input relation storages are placed before all the main loops, then it should be possible to keep references around.
For taking the reference to other intermediate relations and not input relations the problem still stands sadly.
If a reference to a field in an intermediate relation A
is taken and stored in B
and then another A
is added, the HashSet
might move the first relation in memory. So without something like only allowing references when a relation storage reached a fixed point this would be unsafe and not possible.
I implemented binding variables by reference, which allows intermediate relations to refer to input data by reference.
To do that the code is iterating over values by ref
instead (So for ref __crepe_var in
instead of for &__crepe_var in
) and creating a copy for expressions expecting copies. When binding a variable with A(ref x)
then x
will not be a copy but a reference instead.
The code is on my intermediate-lifetimes
branch and it passes all the tests, however I would like to do benchmarking first before committing to this change.
This example now compiles and runs as expected (see test):
crepe! {
@input
struct Input([usize; 4]);
struct Value<'a>(&'a usize);
@output
struct Output(usize);
Value(&x[0]) <- Input(ref x);
Value(&x[2]) <- Input(ref x);
Output(*x) <- Value(x);
}
I'll set up some benchmarking and then compare the performance.
I am now more confident in the changes I made:
A(ref x)
instead of A(x)
then x
will be a reference to the field and not a copyThe benchmarking showed me:
master
+ benches (#11)graph_walk
benchmark, for n=256
and n=512
graph_walk
with n=128
of about 4.8% run time, favouring master
(might be noise but could also be the hashing function or something else)Great work. Thanks especially for adding the benchmarks! I just made a couple minor (non-functionality) changes and renamed S
to CrepeHasher
to reduce the chance of name conflicts. I also removed the for ref __crepe_var_ref in ...
indirection - don't believe it's necessary? But please correct me if I'm wrong about that.
However, I don't understand how you initialize the input relations now. Can you help me understand why this is necessary? Right now, the code generates empty strata for the input relations and also doesn't initialize the indices properly. For example, changing test_transitive_closure to reverse the order of the clauses to Tc(x, z) <- Tc(y, z), Edge(x, y);
leads to a test failure because indices on Edge
are not initialized.
---- test_transitive_closure stdout ----
thread 'test_transitive_closure' panicked at 'assertion failed: `(left == right)`
left: `[(1, 2), (2, 3), (2, 5), (3, 4)]`,
right: `[(1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4)]`', tests/test_transitive_closure.rs:42:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Is it possible to keep how the inputs were initialized before, i.e., in a stratum with the __crepe_first_iteration
and __relation_update
semantics? This should initialize the indices properly. If it's not possible because of how lifetimes work, that's okay too - intermediate lifetimes maybe aren't essential given that everything is Copy
, and I think correctness is more important here.
Thanks for the review and the corrections!
However, I don't understand how you initialize the input relations now.
It was my understanding so far that input relations are, once set, immutable. So there is no point in updating them. That's why I removed the _update
sets for them. I'm not super sure how all that generated code works and why, so it's totally possible I misunderstood and that's an incorrect change to make. In that case it should be reverted of course!
Can you help me understand why this is necessary?
I think this is because if intermediate relations keep references to data inside of the storage of an input relation then that storage must not change any more. If the storage gets modified via .extend(input_updates)
then all elements might be re-hashed and all the references would be invalid.
It could be that with the copy/ref split now that's less of an issue, but at least it gave compile errors before, I think because of that. I will investigate that, unless you say references inside intermediates are not worth the trouble, of course.
For example, changing test_transitive_closure to reverse the order of the clauses to
Tc(x, z) <- Tc(y, z), Edge(x, y);
leads to a test failure because indices onEdge
are not initialized.
Ah, yes I probably forgot about the index then and wrongfully removed that :see_no_evil:
Is it possible to keep how the inputs were initialized before, i.e., in a stratum with the
__crepe_first_iteration
and__relation_update
semantics? This should initialize the indices properly. If it's not possible because of how lifetimes work, that's okay too - intermediate lifetimes maybe aren't essential given that everything isCopy
, and I think correctness is more important here.
Definitely agree about correctness, I'll add this reverse-order of the test_transitive_closure as a regression test. About reverting the initialisation, I'll give it a spin!
Seems like I just did something wrong in the beginning and the initialisation worked with lifetimes all along. Thanks for pointing that out to me!
I also added a test to make sure swapping the order of clauses won't change the result again.
EDIT: Also just to mention it, with the latest changes the benchmarks show that performance is unchanged (for using run()
without Fnv
to have a fair comparison) between this change and master
Thanks, this looks great!
closes #2
Input relations, output relations and intermediate relations can all have one or more lifetime parameters.
If an input relation has a lifetime parameter, the whole
Crepe
type will have a single lifetime parameter and all lifetimes of input relations will use that lifetime (e.g.Type<'a, 'b>
will only be used asType<'a, 'a>
).