PrairieLearn / PrairieLearn

Online problem-driving learning system
http://prairielearn.readthedocs.io/
Other
356 stars 325 forks source link

Stop dataframe digits converting to Scientific Notation #10066

Open ZacWarham opened 3 months ago

ZacWarham commented 3 months ago

Is there a way to stop dataframes from converting to scientific notation? Digits does not seem to solely control this.

<pl-dataframe params-name="df" show-index="false" show-dimensions="false" digits="6" display-language="python"
        show-python="false" show-dtype="false"></pl-dataframe>

image

eliotwrobson commented 3 months ago

This has to do with the format code used in the display logic: https://github.com/PrairieLearn/PrairieLearn/blob/master/apps/prairielearn/elements/pl-dataframe/pl-dataframe.py#L132

Is there a reason to avoid using this? The format code is designed to make the numbers easier to read (avoids displaying excessive zeros).

ZacWarham commented 3 months ago

This has to do with the format code used in the display logic: https://github.com/PrairieLearn/PrairieLearn/blob/master/apps/prairielearn/elements/pl-dataframe/pl-dataframe.py#L132

Is there a reason to avoid using this? The format code is designed to make the numbers easier to read (avoids displaying excessive zeros).

Easier to read is very subjective. It can be more complex for people (like myself) to have to convert these numbers in their head for comparisons and equations, particularly if writing on paper

echuber2 commented 3 months ago

So, in theory it seems like it should already be possible to control this, or in the worst case you could fork the element for your course, but I ran into an issue trying it out.

If you omit the digits attribute on pl-dataframe (causing num_digits = None on the Python side) then it looks like that code path is skipped and large values are no longer displayed as scientific notation. However, when I tried it out, it looks like very small values (e.g. several zeroes after the decimal followed by some digits) are simply truncated as 0, and for some reason 6 decimal places are still being displayed as a fixed precision. Looking around briefly, I couldn't tell where exactly the truncation and 6-digit default are being set. Panda's to_dict does seem to be preserving float64.

For illustration, I messed with the input data for the example course element/dataframe question:

pandas-weirdness

The red value is 0.000000001184 that's being truncated. The green value was entered as 84740000000000000000.00 and preserved without scientific notation or truncation. The six decimal places (orange) are also being enforced somewhere though.

echuber2 commented 3 months ago

I should add that setting digits="4" does result in showing 1.184e-09 and 8.474e+19, so it looks like the truncation is not happening during the parsing stage but somewhere later in the formatting.

echuber2 commented 3 months ago

Okay, it looks like the precision of 6 is just the default built into the Pandas styler. ~(However, changing pd.options.display.precision doesn't seem to influence it as described. It would still have to be set as a format on the instantiated style here.)~ [Edit: @tdy pointed out that the correct global setting would be pd.options.styler.format.precision; I misinterpreted the doc somehow.] This would only affect the unstyled columns though (the float columns when digits is omitted).

[Since the precision was not the original point raised...] It seems like you'd have to just fork the element in order to change the "g" general formatter to "f" for fixed precision.

Or, maybe you'd like to PR a new feature for this.

eliotwrobson commented 3 days ago

@echuber2 Is the consensus here just that adding an option to use fixed precision formatting would resolve this issue? That's a pretty easy change to make, happy to open a PR that does this.

echuber2 commented 3 days ago

@eliotwrobson If I'm recalling the details correctly then it seems that feature would help, and also, we'd need to make sure the digits attribute is still being respected.