GlobalFishingWatch / vessel-scoring

Apache License 2.0
14 stars 11 forks source link

Disable extra fields #54

Open geowurster opened 8 years ago

geowurster commented 8 years ago

@bitsofbits @redhog: Vessel scoring adds a bunch of extra information to the output messages that the pipeline doesn't need and increases the output file size by 50%. Can we get an option to turn it off, or ideally leave it off by default?

here is the same day at 3 different points in the pipeline.

6.99 GiB    gs://benthos-pipeline/data-production/normalize-pipeline/2015-01-01
7.41 GiB    gs://benthos-pipeline/data-production/measures-pipeline/spatial-measures/2015-01-01
12.09 GiB   gs://benthos-pipeline/data-production/classify-pipeline/classify-logistic/2015-01-01
bitsofbits commented 8 years ago

@redhog , I expect that these are all the values that get used in computing the fishing score. I don't see any reason they need to be output at the end of that stage. Is there a way to strip them at the output, or only insert them in a copy of the pipeline data so they don't get output at the end of the stage?

redhog commented 8 years ago

I don't think the vessel-scoring library should remove them, but feel free to strip anything starting with measure_* apart from measure_new_score in an iterator filter in the pipeline itself. This would be the place: https://github.com/SkyTruth/benthos-pipeline/blob/master/benthosp/pipe_classify.py#L59

pwoods25443 commented 8 years ago

@redhog I disagree. It it not a question of whether the scoring library removes those fields, but rather a question if whether it adds those fields. I think a parameter to control this is a fine idea.

On Mon, Aug 1, 2016, 18:36 Egil Möller notifications@github.com wrote:

I don't think the vessel-scoring library should remove them, but feel free to strip anything starting with measure_* apart from measure_new_score in an iterator filter in the pipeline itself. This would be the place: https://github.com/SkyTruth/benthos-pipeline/blob/master/benthosp/pipe_classify.py#L59

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GlobalFishingWatch/vessel-scoring/issues/54#issuecomment-236546799, or mute the thread https://github.com/notifications/unsubscribe-auth/ACkcJVcYt7KdHCJXRRx0-C55efUTHCzAks5qbcwlgaJpZM4JWk0S .

geowurster commented 8 years ago

@redhog I agree with @pwoods25443 from a design standpoint.