IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Detect list of Datasets in Open from Library dialog #5649

Open Patowhiz opened 4 years ago

Patowhiz commented 4 years ago

Describe the bug As mentioned by @rdstern in issue #5587

More difficult (I assume) is to investigate reading more of the datasets, i.e. some that are more complicated than a single data frame. I think quite a lot are a list of multiple data frames. Here is an example I could read

# Code generated by the dialog, Open Dataset from Library
utils::data(package="agricolae", X=DC)
DC <- DC
data_book$import_data(data_tables=list(DC=DC))
rm(DC)

This is the R-code and it gives an error. If I add to it, with [[2]] then it reads fine and reads the second data frame from the list of 3, i.e. :

# Code generated by the dialog, Open Dataset from Library

utils::data(package="agricolae", X=DC)
DC <- DC
data_book$import_data(data_tables=list(DC=DC[[2]]))

rm(DC)

Could we extend the reading for these examples by detecting it is a list and reading all data frames into separate data frames?

I know there will be other examples that still can't be read, but this would be useful.

Patowhiz commented 4 years ago

I think we may need to have the dialogue detect whether the selected dataset is a list of dataframes and call appropriate R commands

dannyparsons commented 4 years ago

The code below would also work to import all data frames together. This would require detection in the dialog by running internal R code.

# Code generated by the dialog, Open Dataset from Library
utils::data(package="agricolae", X=DC)
DC <- DC
data_book$import_data(data_tables=DC)
rm(DC)
rdstern commented 4 years ago

This will help more of the datasets from the library to be read. @Patowhiz could you add the code to be able to read more of these difficult sets. One other type I found is (I think) a zoo object. That's quite important, because it is a type of time series.

There is one set of data in the hydroGOF package that doesn't read. Then quite a number in hydroTSM. Some of these are shape files and that might be more complicated, though we can read them elsewhere. One is called Maquehue Temuco.

For that file the key line of code is:

data_book$import_data(data_tables=list(MaquehueTemuco=MaquehueTemuco)) Here if you add as.ts oras.data.drame e.g. as.ts(MaquehueTemuco) then it reads nicely. The zoo package suggests fortify.zoo to transform into a data frame?

This zoo structure is particularly important to be able to read in, because we are dealing with time series in our climatic work.

Patowhiz commented 4 years ago

@dannyparsons I have been looking at different ways in which this could be solved. This prompted me to have a look at the instat_object_R6 and data_object_R6 files.

I'm struggling to understand why the "type" or rather "class" of the data passed is not being checked at that level to call the appropriate functions for converting them to a correct data frame (near line 96 in the file). I'm asking this because the more I examine the different datasets the more I realise how different structures they have that required different recommended functions to convert them to a data frame e.g As outline above by @rdstern the zoo structure, recommends fortify.zoo() .

Patowhiz commented 4 years ago

@dannyparsons neglect my above question. Just realised why that is not necessary.

Patowhiz commented 4 years ago

@rdstern when you use the data.frame (instead of as.ts() ) command for converting the hydroGOF package time series data EgaEnEstellaQts, did you get a weirdly arranged data in the DataView like in the image below? I'm surprised viewing the data displays it correctly. @dannyparsons what could be the reason for this difference?

# Code generated by the dialog, Open Dataset from Library

utils::data(package="hydroGOF", X=EgaEnEstellaQts)
EgaEnEstellaQts <- EgaEnEstellaQts
data_book$import_data(data_tables=list(EgaEnEstellaQts=data.frame(EgaEnEstellaQts)))

rm(EgaEnEstellaQts) 

dataview

This led me to thinking that the data is not being coerced correctly, upon viewing it I realised it's not so.

view

Patowhiz commented 4 years ago

@rdstern I'm now finalizing on this issue. For the datasets that are lists of data frames. How would you like them to be imported, as in, would you want just the first data frame or all of the data frames.

The downside of importing all the data frames in the dataset is the user looses the ability to name them. Are you okay with that functionality?

Patowhiz commented 4 years ago

@rdstern R Matrices (in package Matrix and xts) are proving to be elusive to me. I'm struggling to get the best commands for coercing data of type matrix to data frame . Command data.frame() loses some data(columns) of the matrix. You can check this in my PR, by looking at what is in the Data View window vs The View Window

In addition, there is 1 dataset in the Matrix package that KNex. That comes in as list. It has 2 items, a matrix and a numeric vector. I can't get the best way to read such dataset in a general way without being very specific i.e having a special case for just this package(So I'm torn between leaving this as it is or implementing it that way for now).

@shadrackkibet could you please check my PR #5836 on this issue and help me with any ideas in regards to the correct R commands for coercing matrix.

@rdstern you can check the script generated by the dialog to see the commands I'm using to solve this issue.

Patowhiz commented 4 years ago

As suggested by @lloyddewit I'll create a separate issue explaining the possible extra features from which this conversation thread can continue. So for now, the PR solves importing of datasets with multiple data frames. I'll leave the code I added for the extra features though.

shadrackkibet commented 3 years ago

The controls in this dialog look messed up when importing multiple files from a directory. I first selected .txt file and got the following. image

Patowhiz commented 3 years ago

@shadrackkibet I'm unable to reproduce the above bug. The PR addressing this issue didn't touch the import dialog, did you get the above bug after going to the import from library bug?