excamera / mu

Framework to Run General-Purpose Parallel Computations on AWS Lambda
94 stars 23 forks source link

Error in xenc_server.py #7

Open anuragkh opened 6 years ago

anuragkh commented 6 years ago

While running the example in mu/README_xc-enc.md, the xcenc_server.py script errors out with the following trace:

Traceback (most recent call last):
  File "/tmp/mu_example/mu/src/lambdaize/xcenc_server.py", line 280, in <module>
    main()
  File "/tmp/mu_example/mu/src/lambdaize/xcenc_server.py", line 277, in main
    run()
  File "/tmp/mu_example/mu/src/lambdaize/xcenc_server.py", line 255, in run
    server.server_main_loop(ServerInfo.states, XCEncSettingsState, ServerInfo)
  File "/tmp/mu_example/mu/src/lambdaize/libmu/server.py", line 265, in server_main_loop
    rnext = r.do_read()
  File "/tmp/mu_example/mu/src/lambdaize/libmu/machine_state.py", line 108, in do_read
    return self.do_handle()
  File "/tmp/mu_example/mu/src/lambdaize/libmu/machine_state.py", line 85, in do_handle
    state = state.transition(msg)
  File "/tmp/mu_example/mu/src/lambdaize/libmu/machine_state.py", line 165, in transition
    return self.post_transition()
  File "/tmp/mu_example/mu/src/lambdaize/libmu/machine_state.py", line 279, in post_transition
    return self.loopState(self)
  File "/tmp/mu_example/mu/src/lambdaize/xcenc_server.py", line 202, in __init__
    dist_from_end = (kfDist if kfDist is not None else ServerInfo.num_passes) - effActNum
TypeError: unsupported operand type(s) for -: 'tuple' and 'int'

Looking into the culprit line, ServerInfo.num_passes is a tuple, while effActNum is an integer.

anuragkh commented 6 years ago

Tagging on a few more issues with xenc_server.py here:

        # figure out what the IP address of the interface talking to AWS is
        # NOTE if you have different interfaces routing to different regions
        #      this won't work. I'm assuming that's unlikely.
        testsock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        testsock.connect(("lambda." + server_info.regions[0] + ".amazonaws.com", 443))
        event['addr'] = testsock.getsockname()[0]
        testsock.close()

If I replace this with a simple:

        import requests
        event['addr'] = requests.get("http://169.254.169.254/latest/meta-data/public-ipv4").content

to get the public ip of the host (on EC2), things work fine.

keithw commented 6 years ago

Hello Anurag,

This is an active research codebase (between Stanford and UC San Diego) and some parts are obviously more in flux than others. I don't think these examples are covered by our continuous integration and it sounds like they have bitrotted from the version used in our ExCamera evaluation (probably 48d2a2c9621c3a27c406b33185a2d143568d7f39 for mu, dc294a6ad621c4dfe286ef83f2d9a24a90f194ca for Alfalfa).

Could you clarify your interest a little bit so we can best help you? E.g. are you seeking to contribute to ongoing development, replicate the ExCamera paper from NSDI 2017, do a comparative evaluation, or something else?

Thanks, Keith

anuragkh commented 6 years ago

Hi Keith,

I am currently trying to replicate the NSDI'17 ExCamera pipeline to better understand the lambda-to-lambda communications via the rendezvous server. I'm in the exploratory stage myself right now, so I'll probably want to do a few modifications to the codebase looking ahead. :)

I'll take a look at the hashes you pointed to and see if I'm able to redo the NSDI'17 ExCamera runs.

Thanks! Anurag

gmporter commented 6 years ago

I'd like to CC Lixiang Ao, a PhD student here at UCSD that has been working on a number of improvements and additions to the mu codebase.

George

On Tue, Feb 20, 2018 at 5:52 PM, Anurag Khandelwal <notifications@github.com

wrote:

Hi Keith,

I am currently trying to replicate the NSDI'17 ExCamera pipeline to better understand the lambda-to-lambda communications via the rendezvous server. I'm in the exploratory stage myself right now, so I'll probably want to do a few modifications to the codebase looking ahead. :)

I'll take a look at the hashes you pointed to and see if I'm able to redo the NSDI'17 ExCamera runs.

Thanks! Anurag

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/excamera/mu/issues/7#issuecomment-367149419, or mute the thread https://github.com/notifications/unsubscribe-auth/AASzHpdgdgBqmvgrLM4OoRxYle-fXXPwks5tW0y0gaJpZM4SMgqt .

sadjad commented 6 years ago

The camera-ready results were generated using excamera/alfalfa@14e88fa6dff567b11ceda1c368ebb65fc27532ee, and excamera/mu@48d2a2c9621c3a27c406b33185a2d143568d7f39.

mziwisky commented 5 years ago

@sadjad i think those commit SHAs must be wrong. https://github.com/excamera/mu/commit/48d2a2c9621c3a27c406b33185a2d143568d7f39 makes calls to xc-enc with --reencode-first-frame, but that option is not included in https://github.com/excamera/alfalfa/commit/14e88fa6dff567b11ceda1c368ebb65fc27532ee.