apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.64k stars 598 forks source link

design and implementation of dry-run #1619

Open objmagic opened 7 years ago

objmagic commented 7 years ago

Basic idea: --dry-run when added to either submit or update subcommands, should display topology packing plan.

objmagic commented 7 years ago

1571 and #1618 refactored control flow of submitter and runtime manager in a way that enables us to easily implement dry-run --- we propagate packing plan information all the way up using exception. Java side (submitter and runtime manager) further propagates info back to Python side (heron client) using stdout with special return code associated (for example, 200).

Rendering of packing plan should be done on Java side.

objmagic commented 7 years ago

image

A glance of the prototype. Still need lots of polish, of course.

kramasamy commented 7 years ago

@objmagic - Wondering if you could make the packing plan more readable. Also, it would be good to include information about the total number of containers, total number of instance for each component, total memory per container, total cpu per container and the aggregate total memory and cpu for the entire topology.

objmagic commented 7 years ago

This is raw renderer. Other renders could print info in a much better way. Will update soon

mycFelix commented 7 years ago

👍🏻

objmagic commented 7 years ago

preview of output of table formatter:

image

billonahill commented 7 years ago

That's awesome, the table formatter looks good. It seems like a summary table view actually. We could envision a detailed table view that shows the instances on each container. One suggestion would be to add line breaks to the table shown at container boundaries. We'd need to add a container id and instance id column and remove parallelism in that context.

objmagic commented 7 years ago

@billonahill that's a good idea. But I'm wondering what's that gonna look like for topology with thousands of instances running. It is not a problem to display such big table on terminal (we can use ncurse to enable scrolling). However, the amount of information will be overwhelming. What useful information do we expect user to get immediately under this detailed mode?

And another question, since user updates topology's "component parallelism", any reason why he needs to reason about how each instance is running in container?

billonahill commented 7 years ago

Some views will be practical on the screen for smaller topologies but not larger ones. Larger one's might be piped to a file and grepped or imported into a spreadsheet for analysis for example.

What I would find useful would be to be able to visually see the distribution of instances (by component type) on each container to be get a sense for how different packing algos change the packing density, balance, etc - similar to what we show on the UI. Maybe that's a different view entirely with an ascii-grid of sorts perhaps.

For updates the user wants to see what container will be modified or added, so being able to easily see that would be great - even if it involved piping to a grep statement to keep things simple.

objmagic commented 7 years ago

From @ajorgensen:

One feature that we added to our fork of heron was a way to output the result of the packing plan in json format. This allowed us to do some intelligent scheduling for availability zones in AWS

Yes, you'll have JSON render.

objmagic commented 7 years ago

and @billonahill, what does the field cpu mean here? and why it has type double?

ajorgensen commented 7 years ago

@objmagic CPU (at least in the aurora case) is not measured in physical cores. In aurora's case resource isolation is the amount of CPU time you get per 100ms. So if you ask for 4.0 CPU you get 400ms of actual cpu time for every 100ms cycle. So you could also ask for 1.5 CPU which would mean you get 150ms of cpu time per 100ms cycle. I believe this ultimately trickles down into cgroups and how it measure cpu time, but that is why it is a double value and not something like an integer.

objmagic commented 7 years ago

updated. now table does colorful and styled formatting:

image

objmagic commented 7 years ago

more screenshots: image

billonahill commented 7 years ago

In the enlarged containers which instances are added? Can you visually represent that?

I like the rendering but do you think it would simplify things by condensing states enlarged, reduced and modified all into just modified? Then you could see which instances were added and removed in each of the modified.

objmagic commented 7 years ago

On Jan 9, 2017, at 09:38, Bill Graham notifications@github.com wrote:

In the enlarged containers which instances are added? Can you visually represent that?

Yes, I have just improved the visualization of this.

I like the rendering but do you think it would simplify things by condensing states enlarged, reduced and modified all into just modified? Then you could see which instances were added and removed in each of the modified.

Yes I have realized that enlarged and reduced should be better generalized to 'modified'

objmagic commented 7 years ago

Updated:

image

billonahill commented 7 years ago

Looks good. We should also highlight what's modified in 1 and 2. At the very top we should output some overall info, like total number of containers and max container size.

kramasamy commented 7 years ago

@objmagic - can you include the command as well? Also you need add documentation to the website as well.

objmagic commented 7 years ago

@billonahill no instance resource is modified in container 1 and container 2. They are modified because only the requiredResource was modified. And yes, I will add some container number and max container info at the top(update: hold off this as discussed with @billonahil). And feel free to give me more suggestion.

kramasamy commented 7 years ago

Nice work @objmagic and thanks to @billonahill for guiding.

objmagic commented 7 years ago

resolved via #1571 #1618 #1629 #1675 #1676

follow-up: