Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
http://packages.python.org/mrjob/
Other
2.62k stars 586 forks source link

It possible to prevent decompression and/or splitting in local or inline mode #2205

Open anjackson opened 3 years ago

anjackson commented 3 years ago

I'm dealing with some block-gzipped files that I would like to pass directly into my MyJob code, without the runner decompressing or splitting them. I've got this working under the hadoop running, but the local and inline runner always force decompression and always force file splitting (defined in SimMRJobRunner I think).

Is there a configuration option that would make it possible to prevent splitting, and ideally also another that prevents decompression?