IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Copying many variables into the multiple receiver - wide data frames again. #8223

Open rdstern opened 1 year ago

rdstern commented 1 year ago

We have a new ambitious plan for handling wide data frames, see #8215. But it will still be necessary to (sometimes) read many variables in a dialogue, from the data selector, into a multiple receiver.

@ChrisMarsh82 I suggest the first example dataset from the extRemes package is a good example, with the Prepare > Column: Reshape > Transpose as a simple dialogue.

a) Of course it takes ages to read into R-Instat. I understand @lilyclements has located the blocking point in the slow importing of the data and may already have checked with @dannyparsons ?

b) Then I was pleasantly surprised how quickly the dialogue opened, with the 12 thousand variables into the data selector. I think @Patowhiz did something about this point?

c) However, it then takes for ages if I do an Add All to put them into the Multiple receiver? Well perhaps 100 seconds, (on a fast machine) and with no message, so it appears to be frozen for that time.

So please could this code be checked? I suspect there is an easy fix?

d) Then it does the Transpose pretty quickly. About 15 seconds. Great.

e) However, when I return to the dialogue and change the data frame so I can transpose back, then I get stuck. I changed the data frame and then tried the Reset key. R-Instat was thyen not-responding. I checked in the task manager and found it was workinh hard. I closed it down after 20 minutes.

f) I was hoping the transpose back, but, of course, never got that far.

rdstern commented 1 year ago

@lilyclements just to confirm that the time needed to handle a wide dataset is not just associated with the initial importing of the data. It is presumably all to do with producing and storing the new sheet in a databook.

So here is the offending file, already transposed. So it is now 4 variables and 12,000 rows. Of course that reads easily into R-Instat. CarcassonneHeat_transposed.zip

(I have still included the rockart data, but that can be ignored.)

We now transpose back into the wide format, See below:

image

This works - eventually. There are also clearly 2 stages in the production of the wide sheet. The first took about 8 minutes on my machine, with the usual message that it was taking a long time.

Then the message disappeared, but it still took over 2 further minutes before the result appeared.

I hope this helps?

Patowhiz commented 1 year ago

@rdstern an import to the data book has to happen every time a new data sheet is produced, so that explains the time issue. In regards to the multiple receiver, I suspect it's an event issue similar to what the selector was previously doing. It may of course be a bigger issue but very solvable at the control level.