Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.6k stars 981 forks source link

export csv double/float columns with decimal notation #2434

Open mklechan opened 6 years ago

mklechan commented 6 years ago

A column with values like: DT = data.table(a = c(1.3,1.0,1.0,1.0,1.0))

exports to csv where:

a
1.3
1
1
1
1

For h2o-3 parsing inter-operability, we should have 1.0 not 1. Could we add decimal notation to all values in these columns.

MichaelChirico commented 6 years ago

Hmm. Seems like a way to blow up file sizes unnecessarily? Is my first reaction...

On Oct 21, 2017 12:31 AM, "Mark Chan" notifications@github.com wrote:

exports to csv where:

a 1.3 1 1 1 1

For h2o-3 parsing inter-operability, we should have 1.0 not 1. Could we add decimal notation to all values in these columns.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Rdatatable/data.table/issues/2434, or mute the thread https://github.com/notifications/unsubscribe-auth/AHQQdV26L58gp6dY7V7ORLBJN5Dx_LEXks5suMrIgaJpZM4QA6kx .

mklechan commented 6 years ago

Yes we sacrifice the file size, in return for better parsing and type detection of numeric columns. The parsing in h2o-3 scans the first few lines of a file to determine the type of each column. If the notation is consistent throughout the column, it will lead to less errors.