flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
86 stars 40 forks source link

Use of signed 64bit integer for jobid #938

Open dongahn opened 2 years ago

dongahn commented 2 years ago

From https://github.com/flux-framework/flux-sched/pull/937#issuecomment-1121351522:

There are a few places in the fluxion code that return -1 or jobid to check error conditions and we have assumed jobid won't exceed the max of int64_t. The ticket is opened in case this can be a problem.

First action: find the place where the max of int64_t is assumed.

grondo commented 2 years ago

This might not be a practical problem, though it requires more study. I think a flux_jobid_t will not overflow INT64_MAX until after ~17 years, since the highest 40 bits of a FLUID is the time in milliseconds since the FLUID generator epoch. (See RFC 19 - Flux Locally Unique ID). (I'd have to double check the exact format of 64bit integers to be sure though)

dongahn commented 2 years ago

until after ~17 years,

:-) Uptime of 17 years .... Sounds good on more studies

trws commented 1 month ago

@grondo do you consider this a problem going forward?

grondo commented 1 month ago

Good question. We should probably revisit and ensure my analysis above is correct.

trws commented 1 month ago

My bigger worry, occurred to me randomly this morning, is if we rely on it being a 64-bit int and sending it as a json number. JSON encodes all numbers as doubles, so we will get errors if we use more than a 54bit signed integer can encode and send them as numbers rather than strings.


Sent from Workspace ONE Boxerhttps://whatisworkspaceone.com/boxer On July 31, 2024 at 12:38:02 PM PDT, Mark Grondona @.***> wrote:

Good question. We should probably revisit and ensure my analysis above is correct.

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/flux-framework/flux-sched/issues/938*issuecomment-2261291540__;Iw!!G2kpM7uM-TzIFchu!xf8Qa-UIwqAiA-5uahodZDmazYa8CYzku5jr5OE4EghF1XT6BlaNSi7QWY4Sbme42qLcyWMFMp9ksIwRTJ6mt-QMpYk$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNPSQQ4OVZ2T635NWITZPE4JLAVCNFSM6AAAAABLXNNIMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGI4TCNJUGA__;!!G2kpM7uM-TzIFchu!xf8Qa-UIwqAiA-5uahodZDmazYa8CYzku5jr5OE4EghF1XT6BlaNSi7QWY4Sbme42qLcyWMFMp9ksIwRTJ6mJ8oGtBg$. You are receiving this because you commented.Message ID: @.***>

grondo commented 1 month ago

Jansson has distinct types for integer and real numbers, so we don't have this problem in flux-core. I'd be surprised if it is an issue in whatever Json library you are using in fluxion. 🤷

On Wed, Jul 31, 2024, 12:41 PM Tom Scogland @.***> wrote:

My bigger worry, occurred to me randomly this morning, is if we rely on it being a 64-bit int and sending it as a json number. JSON encodes all numbers as doubles, so we will get errors if we use more than a 54bit signed integer can encode and send them as numbers rather than strings.


Sent from Workspace ONE Boxerhttps://whatisworkspaceone.com/boxer On July 31, 2024 at 12:38:02 PM PDT, Mark Grondona @.***> wrote:

Good question. We should probably revisit and ensure my analysis above is correct.

— Reply to this email directly, view it on GitHub< https://urldefense.us/v3/__https://github.com/flux-framework/flux-sched/issues/938*issuecomment-2261291540__;Iw!!G2kpM7uM-TzIFchu!xf8Qa-UIwqAiA-5uahodZDmazYa8CYzku5jr5OE4EghF1XT6BlaNSi7QWY4Sbme42qLcyWMFMp9ksIwRTJ6mt-QMpYk$>, or unsubscribe< https://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNPSQQ4OVZ2T635NWITZPE4JLAVCNFSM6AAAAABLXNNIMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGI4TCNJUGA__;!!G2kpM7uM-TzIFchu!xf8Qa-UIwqAiA-5uahodZDmazYa8CYzku5jr5OE4EghF1XT6BlaNSi7QWY4Sbme42qLcyWMFMp9ksIwRTJ6mJ8oGtBg$>.

You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-sched/issues/938#issuecomment-2261300970, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFVEUXNWQLYKWQ74EYUQVTZPE4X7AVCNFSM6AAAAABLXNNIMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGMYDAOJXGA . You are receiving this because you were mentioned.Message ID: @.***>

trws commented 1 month ago

We use jansson in sched currently, though we’ll probably use boost json or similar in some places in the future which does also support it so that’s not the part I’m most worried about. That said json itself does not, and that includes the json libraries we use in python and lua. Python’s built in main lib dies if you try to give it a 64bit int from numpy or ctypes, and some of our supported lua versions can’t represent a 64 bit int natively to begin with because they also only support double (all below 5.3). Unless we want to play whack-a-mole with all the language integrations and possible json encoding issues, we should really limit ourselves to numbers precisely representable as doubles or send them as strings.


Sent from Workspace ONE Boxerhttps://whatisworkspaceone.com/boxer On July 31, 2024 at 1:08:58 PM PDT, Mark Grondona @.***> wrote:

Jansson has distinct types for integer and real numbers, so we don't have this problem in flux-core. I'd be surprised if it is an issue in whatever Json library you are using in fluxion. 🤷

On Wed, Jul 31, 2024, 12:41 PM Tom Scogland @.***> wrote:

My bigger worry, occurred to me randomly this morning, is if we rely on it being a 64-bit int and sending it as a json number. JSON encodes all numbers as doubles, so we will get errors if we use more than a 54bit signed integer can encode and send them as numbers rather than strings.


Sent from Workspace ONE Boxerhttps://whatisworkspaceone.com/boxerhttps://urldefense.us/v2/url?u=https-3A__whatisworkspaceone.com_boxer-253E&d=DwQFaQ&c=pKoAVQro6qDbLoK0T8588B4mZJhJhC4e6QXJy0XnJec&r=whC-hqpzwRHxKu-QIyx5J03uW-zSOfP0omgZBpITmPA&m=prCTS6e-B4WSDDE3jrL5mFuLb8VR9rVyPlpAU7GKMVO87bwXkqhtplh2iUb8p0Jz&s=RFs8AZxOJfLAphAMyTsG7j8Wh-E7srN1E-ov-jfaDAY&e= On July 31, 2024 at 12:38:02 PM PDT, Mark Grondona @.***> wrote:

Good question. We should probably revisit and ensure my analysis above is correct.

— Reply to this email directly, view it on GitHub< https://urldefense.us/v3/__https://github.com/flux-framework/flux-sched/issues/938*issuecomment-2261291540__;Iw!!G2kpM7uM-TzIFchu!xf8Qa-UIwqAiA-5uahodZDmazYa8CYzku5jr5OE4EghF1XT6BlaNSi7QWY4Sbme42qLcyWMFMp9ksIwRTJ6mt-QMpYk$%3E, or unsubscribe< https://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNPSQQ4OVZ2T635NWITZPE4JLAVCNFSM6AAAAABLXNNIMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGI4TCNJUGA__;!!G2kpM7uM-TzIFchu!xf8Qa-UIwqAiA-5uahodZDmazYa8CYzku5jr5OE4EghF1XT6BlaNSi7QWY4Sbme42qLcyWMFMp9ksIwRTJ6mJ8oGtBg$%3E.

You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/flux-framework/flux-sched/issues/938#issuecomment-2261300970https://urldefense.us/v2/url?u=https-3A__github.com_flux-2Dframework_flux-2Dsched_issues_938-23issuecomment-2D2261300970-253E&d=DwQFaQ&c=pKoAVQro6qDbLoK0T8588B4mZJhJhC4e6QXJy0XnJec&r=whC-hqpzwRHxKu-QIyx5J03uW-zSOfP0omgZBpITmPA&m=prCTS6e-B4WSDDE3jrL5mFuLb8VR9rVyPlpAU7GKMVO87bwXkqhtplh2iUb8p0Jz&s=dL251c29gnozg5fd86-gNjE0UZLJamkYxcA-qkuEznc&e=, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFVEUXNWQLYKWQ74EYUQVTZPE4X7AVCNFSM6AAAAABLXNNIMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGMYDAOJXGAhttps://urldefense.us/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAFVEUXNWQLYKWQ74EYUQVTZPE4X7AVCNFSM6AAAAABLXNNIMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGMYDAOJXGA-253E&d=DwQFaQ&c=pKoAVQro6qDbLoK0T8588B4mZJhJhC4e6QXJy0XnJec&r=whC-hqpzwRHxKu-QIyx5J03uW-zSOfP0omgZBpITmPA&m=prCTS6e-B4WSDDE3jrL5mFuLb8VR9rVyPlpAU7GKMVO87bwXkqhtplh2iUb8p0Jz&s=XqWFqWmoeerEa8rbZhfnKrjkxpjgtcQh82eHVK6xcdY&e= . You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v2/url?u=https-3A__github.com_flux-2Dframework_flux-2Dsched_issues_938-23issuecomment-2D2261357311&d=DwMFaQ&c=pKoAVQro6qDbLoK0T8588B4mZJhJhC4e6QXJy0XnJec&r=whC-hqpzwRHxKu-QIyx5J03uW-zSOfP0omgZBpITmPA&m=prCTS6e-B4WSDDE3jrL5mFuLb8VR9rVyPlpAU7GKMVO87bwXkqhtplh2iUb8p0Jz&s=oNwAOBROiFGRI5VJNQ3GKnjoYqqYyJGml3MOLwLbXsw&e=, or unsubscribehttps://urldefense.us/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAFBFNITCMZWR65FARGGSQLZPE745AVCNFSM6AAAAABLXNNIMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGM2TOMZRGE&d=DwMFaQ&c=pKoAVQro6qDbLoK0T8588B4mZJhJhC4e6QXJy0XnJec&r=whC-hqpzwRHxKu-QIyx5J03uW-zSOfP0omgZBpITmPA&m=prCTS6e-B4WSDDE3jrL5mFuLb8VR9rVyPlpAU7GKMVO87bwXkqhtplh2iUb8p0Jz&s=S5NQb2mIe4-eW6Z41TkBG8-8yaU_iNCffr-nrhJtb20&e=. You are receiving this because you commented.Message ID: @.***>

garlick commented 1 month ago

Ugh, I guess that could be a problem then.

RFC 8259 apparently does not specify precision or range of numbers. Because it doesn't, the RFC suggests assume double precision IEEE 754

This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.

Note that when such software is used, numbers that are integers and are in the range [-(253)+1, (253)-1] are interoperable in the sense that implementations will agree exactly on their numeric values.