TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.52k stars 182 forks source link

The problem of implementation examples #63

Open yueyuanwendy opened 7 years ago

yueyuanwendy commented 7 years ago

Hi, I am a new of Rust. Now I am trying to implement the "arrange.rs" which is the first example in the examples of differential dataflow. when I try to run cargo build,I think this step is just to download and update the dependencies in the Cargo.toml. But it has this problem, "the function takes 1 parameter but 0 parameters were supplied"

134----ErrVirtualAlloc(i32),--defined here

324-- 0=>Err(ErrVirtualAlloc()),---expected 1 parameter

I don't know how to solve this problem. Could you give me some advice? default

frankmcsherry commented 7 years ago

Hi,

Can you say what platform you are on? It looks like windows, and perhaps that mmap is broken on windows (it has historically worked, but perhaps you have a surprising installation?). I can point you at the repo to ask them what is up, but it looks like it has been stable for a while (https://github.com/rbranson/rust-mmap). Alternately, we could ask on #rust on irc.

I'm on the road at the moment, so I'm not sure I can track this down for you right now, but if I get a moment (or if you have time) either of the above seem like the right path to follow.

yueyuanwendy commented 7 years ago

Yes, It is windows. I just installed Rust last week, It can update some other dependencies. I will try install it again and try to ask them too. Thanks a lot.

frankmcsherry commented 7 years ago

If it remains broken, another option is to remove the dependence from your local copy of the repo; it (mmap) is only used for examples that load graphs, and we can snip those out (I'll explain how, but maybe in a day+ when I get an hour).

Sent from my iPhone

On May 26, 2017, at 06:06, yueyuanwendy notifications@github.com wrote:

Yes, It is windows. I just installed Rust last week, It can update some other dependencies. I will try install it again and try to ask them too. Thanks a lot.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

frankmcsherry commented 7 years ago

Thank you for the report, by the way; I suspect there aren't too many windows users, and good to get information on why. :)

Sent from my iPhone

On May 26, 2017, at 06:06, yueyuanwendy notifications@github.com wrote:

Yes, It is windows. I just installed Rust last week, It can update some other dependencies. I will try install it again and try to ask them too. Thanks a lot.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

yueyuanwendy commented 7 years ago

Hi, I ask that author, he answered to me: "My advice is to not use mmap because it is abandoned and unmaintained. You should use https://github.com/danburkert/memmap-rs instead.". T-T

frankmcsherry commented 7 years ago

Cool, I'll look in to porting the relevant code over. It may take a day or so. Sorry about the crap ecosystem. :)

Sent from my iPhone

On May 26, 2017, at 07:41, yueyuanwendy notifications@github.com wrote:

Hi, I ask that author, he answered to me: "My advice is to not use mmap because it is abandoned and unmaintained. You should use https://github.com/danburkert/memmap-rs instead.". T-T

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

frankmcsherry commented 7 years ago

Hello! I've updated the graph_map dependence to use memmap rather than mmap. It builds for me, but I'm not on windows so I wouldn't see if anything is failing in either case. But, I think you should be able to try it out and see if the compilation gets further along. Most of the examples don't use the dependence (deals.rs does, weaver.rs does, cc.rs does) and I haven't checked to see that they still do what they used to do (i.e. that memmap has the same behavior as mmap).

Edit: You'll want to do cargo update to grab the updated dependence, and then give it a cargo build to see what happens.

yueyuanwendy commented 7 years ago

Hi, I update and cargo build. I have another problem :"can't find crate of 'differential_dataflow'. "I don't know why. I think maybe there are some problem with my compilation steps? Could you tell me what is the general step of compiling? I just paste your examples' code and cargo.toml to the eclipse's Rust project.......I'm not sure whether it is right. T-T Thanks a lot.

frankmcsherry commented 7 years ago

Here is what I would start with:

Echidnatron% git clone https://github.com/frankmcsherry/differential-dataflow
Cloning into 'differential-dataflow'...
remote: Counting objects: 12189, done.
remote: Compressing objects: 100% (775/775), done.
remote: Total 12189 (delta 963), reused 310 (delta 259), pack-reused 11154
Receiving objects: 100% (12189/12189), 7.42 MiB | 6.98 MiB/s, done.
Resolving deltas: 100% (9363/9363), done.
Echidnatron% cd differential-dataflow
Echidnatron% cargo build
    Updating git repository `https://github.com/frankmcsherry/timely-dataflow`
    Updating registry `https://github.com/rust-lang/crates.io-index`
    Updating git repository `https://github.com/frankmcsherry/graph-map.git`
   Compiling timely_sort v0.1.6
   Compiling fnv v1.0.5
   Compiling libc v0.2.23
   Compiling byteorder v0.4.2
   Compiling getopts v0.2.14
   Compiling abomonation v0.4.5
   Compiling memmap v0.5.2
   Compiling time v0.1.37
   Compiling timely_communication v0.1.5
   Compiling graph_map v0.1.0 (https://github.com/frankmcsherry/graph-map.git#952c3266)
   Compiling timely v0.2.0 (https://github.com/frankmcsherry/timely-dataflow#311f8dae)
   Compiling differential-dataflow v0.1.1 (file:///Users/mcsherry/Projects/test/differential-dataflow)
    Finished dev [unoptimized + debuginfo] target(s) in 13.8 secs
Echidnatron%

Once this works, you can try out:

Echidnatron% cargo build --example arrange
   Compiling either v1.1.0
   Compiling rand v0.3.15
   Compiling itertools v0.6.0
   Compiling differential-dataflow v0.1.1 (file:///Users/mcsherry/Projects/test/differential-dataflow)
    Finished dev [unoptimized + debuginfo] target(s) in 22.46 secs
Echidnatron%

Both of these just worked for me, just now. It could be that cargo is different in an unfortunate way under windows (perhaps path names are interpreted differently? I'm not sure at the moment). If you can report what you have done that produces the error, or how the above fail to work for you, that would help me.

Thanks!

yueyuanwendy commented 7 years ago

And I directly compile what you said don't use the dependence. They have the same problem.....

frankmcsherry commented 7 years ago

I'm sorry, I don't understand what you mean. If you type the above into a shell, do you get the same sort of output as I reported, or do you get something different with errors that you can share with me? :)

yueyuanwendy commented 7 years ago

1 2

frankmcsherry commented 7 years ago

My best guess is that you are trying to put together a new binary project which will link against differential dataflow. This is different from the examples in the differential dataflow repository, in that you need to add a reference to the differential dataflow crate (which is implicit for all of the examples that come with differential dataflow).

If you have not, I recommend adding the following to your Cargo.toml:

[dependencies.differential_dataflow]
git="https://github.com/frankmcsherry/differential-dataflow.git"

This will inform your new project that it should grab the differential dataflow crate from github.

Note: if you are copy/paste-ing from the repository, be aware that some of the examples get changed when the underlying repository changes. For example, you will probably want to re-pull examples/arrange.rs as it has recently changed (the inputs get flushed correctly now, whereas they previously did not; it should run correctly now, vs stall previously).

frankmcsherry commented 7 years ago

Also, as a general comment: if you are taking a dependence on differential dataflow (like, for a course project or something), I totally recommend either (i) getting in touch via email, (ii) joining the gitter.im timely channel, and maybe (iii) pinning to a specific revision of the repository, so that we can avoid making breaking changes that will make you unhappy for using differential. :)

yueyuanwendy commented 7 years ago

OK, If I have another problem, I will send email. :)

frankmcsherry commented 7 years ago

Posting here is fine; I didn't mean to discourage that. More that, as the project is under development, we might have plans to break things that we think of as minor, but might cause lots of problems for you. It's good to have a sense for what things people want/need to be stable in the short term, to avoid annoying them (you).

yueyuanwendy commented 6 years ago

Hi, Could I ask that do you implement some operator such as right join in differential dataflow? I only find some join operator like inner join. And do you have some method that could read and save the final data flow in each round? not the inspect.

Thanks

frankmcsherry commented 6 years ago

Hi,

  1. Let me show you how I've implemented something like a left join in differential dataflow. Depending on what you need, it may be good enough.

The join operator's implementation strongly relies on the bilinearity of join, which (to my understanding) left and right joins break. So instead of implementing either as an operator, we implement them as a slightly larger dataflow fragment.

Left joins are often used as part of an aggregation (because otherwise you may end up with null fields). If that is what you need, then a similar pattern may also work for you.

For example, TPCH Query 13 does a left outer join followed by a count, and you can check out its full implementation in differential dataflow here. Informally, it first makes sure each key is present in the to-be-accumulated collection, and then corrects the accumulation (subtracting one in this case).

    collections
        .customers()
        .map(|c| c.cust_key)
        .concat(&orders)
        .count_total()
        .map(|(_cust_key, count)| (count-1) as usize)

At the moment if you'd like the full expressive power of a left or right join, you'll need to build the logic out of existing parts. For example, if you have collections left and right, one can determine which keys in left are missing from right using

let left_keys = left.map(|(key, _)| key).distinct();
let right_keys = right.map(|(key, _)| key).distinct();
let missing = left_keys.map(|key| (key, ()).antijoin(&right_keys);
  1. For saving data, I would either recommend inspect_batch, or capture_into, both of which are timely dataflow methods. I see that you said "not inspect", but inspect_batch is probably the right way to do this (could you explain your constraint, if it doesn't work for you?).

For inspect_batch, you can do whatever you like with a batch of records, and this can be writing them out to the console, or to a file, or whatever you like. You can open a file as std::io::Write, move it into the inspect closure, and then call write repeatedly however you want to serialize the data.

For capture_into, this is how timely dataflow serializes its streams so that they can be played back again. It can serialize into arbitrary W: Write types, but it uses timely's serialization mechanisms (Abomonation) rather than whatever your favorite serialization happens to be. This mean that it is fine for saving timely streams to replay them, but less good for reading them back into .csv or something like that.

Let me know if these help, and if unclear what further info you would need!

isubasinghe commented 2 years ago

Hey @frankmcsherry I gotta say, I really appreciate the level of detail you've provided in these issues. I am currently working on my thesis (incremental louvain modularity with DD) and your documentation on these various issues have been immensely helpful.

I might blog about my approach after I publish (hopefully) to help out the project, I think it is a really interesting project that deserves more attention. I suspect the barrier of entry to DD is what is keeping it from getting more popular.

Just wanted to say thanks :-)