LeonieHagitte / ShareSEM

0 stars 0 forks source link

problem with df5 #22

Closed brandmaier closed 5 months ago

brandmaier commented 5 months ago

I identified a problem with the data in df5.csv. Apparently, some columns have a comma as decimal separator and others have a dot as decimal separator. This throws off your data loading code, such that a lot of data points become NA and/or rounded to integers. For example, df5$cA1 has dots and df5$cC1 uses comma.

@LeonieHagitte , please fix the preprocessing of df5.csv (or maybe I only have an outdated version of it?)

LeonieHagitte commented 5 months ago

Ah I see - I will take a look at it and report back!

LeonieHagitte commented 5 months ago

I am not sure, if i dont find the problem, or whther we are doing something differently - if i load df5.csv it works for me, but maybe i had changed something, that i dont remember anymore. If you run the Dataprep.rmd and then call on df5 like

df5 <- read_delim("df5.csv", delim = ";", 
    escape_double = FALSE, col_types = cols(...1 = col_skip(), 
        mergeid = col_skip(), yrbirth = col_skip()), 
    trim_ws = TRUE)

does that return you with the issue?

LeonieHagitte commented 5 months ago

ah i think i found the issue, it seems to occur when i try to convert the characters into numerics - there NAs get created, where beforehand are values. I try to get rid of that.

LeonieHagitte commented 5 months ago

@brandmaier I am falling out of ideas right now - it seems although i specifiy the terms of the decimal marks and deliminators when calling on the file, it only takes those settings for some columns, and not for the rest of the columns, thats why we arrive then at a dataframe with mixed decimal marks. Bu i dont know why that happens or how I can fix that.

LeonieHagitte commented 5 months ago
library(readr)
df5 <- read_delim("df5.csv", delim = ";", 
    escape_double = FALSE,locale = locale(decimal_mark = ".", grouping_mark = ","), trim_ws = TRUE)
brandmaier commented 5 months ago

Solved with latest commits.