Open kevinburke opened 11 years ago
What do you mean by "I thought I could specify S3 urls in the args list, turns out you can't". I'm working with mrjob 0.3.5 and I've got some extensive tooling written that handles everything within python code, and my jobs are passing upwards of 20+ s3 urls as the input for any one job. Assuming you're not using 0.4dev and it's not a regress there, can you retry and report back with whatever issue you've encountered?
Changing this ticket to checking the mr_*.py
script for the string run()
and issuing a warning if it's missing, which would have caught your mistake.
Hey, just wanted to share an experience I had trying to run a MRJob from another script.
The main.py script looked something like this:
Then the actual job looks something like this:
First the script just waited forever for input, until (I think) I remembered to echo an s3 url and pipe it to python.
Then I kept getting a "step description is empty!" message. I tried redefining steps() in the Bagcheck class, but that didn't do anything. Eventually I realized I was missing the
lines at the bottom of bagcheck.py.
What's the lesson or the improvement to be made? I'm not sure. I wanted to run the mrjob from another Python script to avoid piping over stdout to a separate script, but it appears MRJob is set up much better for the 'streaming-over-stdout' use case.
It also appears running MRJob from a separate script is swallowing the usual stderr from MRJob, which is why calling main.py without s3 urls just waited forever without doing anything or echoing anything. I'm trying to figure out how to add a
verbose
flag to the separate script runner now.