DillonHammill / DataEditR

An Interactive R Package for Viewing, Entering Filtering and Editing Data
https://dillonhammill.github.io/DataEditR/
381 stars 40 forks source link

type.convert turns in certain cases numerics to integer, so that only integers are allowed when editing #48

Open aalucaci opened 2 years ago

aalucaci commented 2 years ago

Hello, thank you for your package, which I started using recently.

I noticed the following problem: data.frames containing numeric columns are converted to integer if the values are not fractional. A little bit of digging around showed that this is due to the usage of type.convert:

lst1 <- list(a = 4.0)
str(lst1)
# List of 1
# $ a: num 4

lst2 <- utils::type.convert(lst1)
str(lst2)
# List of 1
# $ a: int 4

In case of mtcars for example, this means that some columns, although initially of type numeric, are converted to type integer:

str(mtcars)
# 'data.frame': 32 obs. of  11 variables:
#   $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
# $ disp: num  160 160 108 258 360 ...
# $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
# $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
# $ qsec: num  16.5 17 18.6 19.4 17 ...
# $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
# $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
# $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
# $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

str(type.convert(mtcars))
# 'data.frame': 32 obs. of  11 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : int  6 6 4 6 8 6 8 4 4 6 ...
# $ disp: num  160 160 108 258 360 ...
# $ hp  : int  110 110 93 110 175 105 245 62 95 123 ...
# $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
# $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
# $ qsec: num  16.5 17 18.6 19.4 17 ...
# $ vs  : int  0 0 1 1 0 1 0 1 1 1 ...
# $ am  : int  1 1 1 0 0 0 0 0 0 0 ...
# $ gear: int  4 4 4 3 3 3 3 4 4 4 ...
# $ carb: int  4 4 1 1 2 1 4 2 2 4 ...

As a consequence, for example when the cyl column is edited with DataEditR, only integer values are accepted.

I fixed this locally by replacing the following lines of code from helpers.R:

    # MATRIX - SAME COLUMN CLASS
    if("matrix" %in% data_class) {
      data <- as.matrix(data)
    # DATA.FRAME - DIFFERENT COLUMN CLASSES
    } else {
      for (z in colnames(data)) {
        data[, z] <- type.convert(data[, z], as.is = !col_factor)
      }
    }

with

   # MATRIX - SAME COLUMN CLASS
    if("matrix" %in% data_class) {
      data <- as.matrix(data)
    # DATA.FRAME - DIFFERENT COLUMN CLASSES
    } else {
      for (z in colnames(data)) {
        if (!is.numeric(data[, z][1]))
        {
          data[, z] <- type.convert(data[, z], as.is = !col_factor)
        }
        else
        {
         #do not convert
        }
      }
    }

(I apologize, for technical reasons I'm not able to create a pull request.)

Would it be possible for you to consider this workaround, or do you see some possible issues with it?

I also had a look in the list of R bugs, to check if anybody posted something about type.convert. I found this: https://bugs.r-project.org/show_bug.cgi?id=17979 which describes some questionable conversions of dates, timestamps and tseries. Not sure what the impact is for rhandsontable, maybe adding a word of caution in the description would help the users.

Kind regards, Angela

ShinyFabio commented 1 year ago

I'm facing the same problem. I think this should be corrected since even the factors columns are converted into numeric.

DillonHammill commented 1 year ago

@ShinyFabio, do you have any examples of the factor conversion issues you are seeing? I am having a look at this and I would like to make changes that will address all issues (if possible).

DillonHammill commented 1 year ago

I just pushed @aalucaci's suggestion to data_format() which now returns for mtcars:

> str(DataEditR:::data_format(mtcars))
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

In the case of converting characters to factors, the default retains characters:

> str(DataEditR:::data_format(iris))
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...

Using col_factor correctly converts the character column to a factor:

> str(DataEditR:::data_format(iris, col_factor = TRUE))
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

This factor column looks the same as the one in the original iris dataset:

> str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
ShinyFabio commented 1 year ago

Yes sure, on monday I'll try to send you an example.