ga4gh / task-execution-schemas

Apache License 2.0
82 stars 29 forks source link

Maximum message size #44

Closed buchanae closed 7 years ago

buchanae commented 7 years ago

We need to specify some maximum size for most fields, especially fields with large strings like logs and input contents.

Would it be easier/better to specify a maximum message size?

Alternatively, do we need to document maximum sizes for "string" fields, and "repeated" fields, and "map" fields?

geoffjentry commented 7 years ago

My only comment on this topic whenever it comes up is that the API must provide some mechanism for me to obtain the full thing. If that's a pointer to a file or something like that, that's AOK. Other than that I don't really care.

buchanae commented 7 years ago

@geoffjentry I realize now that I was specifically thinking of the maximum message size of a task submitted to RunTask. So, maximum input message.

geoffjentry commented 7 years ago

Oh, I totally misunderstood.

I like the idea on paper but this feels like it could run afoul of the "640K should be enough for anyone" type of thing. Is this for a single input datum or for the entire thing?

buchanae commented 7 years ago

Ya, maybe it's better to just let implementations decide what size they support instead of specifying it in the spec?

I was thinking for the whole Task message, but I'm leaning towards closing this now.

geoffjentry commented 7 years ago

Yeah if it is the whole thing I was picturing a case where there were e.g. 100k files being input, each with a long bucket path. It can add up quickly.

OTOH I could see one saying "please don't do that, find a different way" to my situation above :)

buchanae commented 7 years ago

I think this is probably best left up to implementations, at least for now. I'm not sure we can pick a reasonable number that covers all implementations, nor am I sure there's much value to that. So, I'm going to close this.