IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Add options to the Prepare > Data Frame > Rename Column dialogue #6877

Open rdstern opened 2 years ago

rdstern commented 2 years ago

A trivial point to include in the next update on frmMain is to change Rename Column... to Rename Columns... It is now also for multiple columns.

We can have Variable Labels as well as Variable Names. R itself isn't very good on Variable Labels, but I would like to encourage their use more in R-Instat. I like what SPSS and other packages encourage, namely relatively short variable names plus variable labels when needed. In some of our examples the variable names are very long (up to 10 words) and no use is made of the variable labels.

There is always a space for them in the column metadata. I have been asking about the possibility of pasting there. But anyway I suggest we also add facilities also into the Rename dialogue, possibly as follows:

In the Single option I suggest we have a group box with label Column (Variable) Label: It is a bit to the left of the current label. Inside there are 3 radio buttons. O Enter - this is the initial default O Copy From Current Name O Copy From New Name Then there is the field for the label, which is made wider than the name fields.

Now trickier is what to do on Multiple? It is possible we need another top button, or we add to the options here. Perhaps there are 2 ordinary radio buttons for Names Only and Include Labels. Or there could be a checkbox, default unchecked, saying Include Variable Labels. The control then needs 2 columns, one for the names and the other for the labels. Then it will be easy to paste into the variable labels. And there could be an option Copy names to Labels here.

I suggest the operations on the names (and there will be more) could be separate from those on Variable labels. So, when the Variable Labels option is used, then the options relate just to the labels.

I am suggesting Shadrack should check on a possible structure. Then perhaps @N-thony could do the work, once agreed?

rdstern commented 2 years ago

There is a bug - or two on the current rename dialogue. It seems to be our own function data_book$rename_column_in_data
a) When I use it in the single rename it adds a new bit to the command, as follows:

# Code generated by the dialog, Rename Columns
data_book$rename_column_in_data(data_name="mtcars", column_name="mpg", new_val="mpg", label="Miles/(US) gallon", .fn=tolower)

b) It adds a value label correctly. I wondered if it could work just adding the variable label and not changing the name. But it needs the new name, even if the same as the old one. I suppose that does no harm.

c) But I then put in an empty string to get rid of the variable label. It ignores that and keeps the old one. I can replace it my space, but I should also be able to delete a variable label here.

d) Maybe the command can be extended for multiple variables? See below for this part.

rdstern commented 2 years ago

Here is the current dialogue. image

There should now be 3 buttons at the top. I wonder if they could be split so the first has the label Single Variable. Then there is a label - not in the Buttons Multiple Variables: Then 2 more buttons, the first says Name and the Second Says Label. If that doesn't look good then perhaps the three buttons together with a label Variables: before. The 3 buttons are then Single, Multiple: Name, Multiple: Label. The button to add is for the multiple Label option. It is a grid with 2 variables. The first is a receiver for the names and the second is another for the labels. This is wider than the names one. The variables one can be edited, and one can paste into it, etc. I assume the names cannot be edited here?

Ivanluv commented 2 years ago

A trivial point to include in the next update on frmMain is to change Rename Column... to Rename Columns... It is now also for multiple columns.

We can have Variable Labels as well as Variable Names. R itself isn't very good on Variable Labels, but I would like to encourage their use more in R-Instat. I like what SPSS and other packages encourage, namely relatively short variable names plus variable labels when needed. In some of our examples the variable names are very long (up to 10 words) and no use is made of the variable labels.

There is always a space for them in the column metadata. I have been asking about the possibility of pasting there. But anyway I suggest we also add facilities also into the Rename dialogue, possibly as follows:

In the Single option I suggest we have a group box with label Column (Variable) Label: It is a bit to the left of the current label. Inside there are 3 radio buttons. O Enter - this is the initial default O Copy From Current Name O Copy From New Name Then there is the field for the label, which is made wider than the name fields.

Now trickier is what to do on Multiple? It is possible we need another top button, or we add to the options here. Perhaps there are 2 ordinary radio buttons for Names Only and Include Labels. Or there could be a checkbox, default unchecked, saying Include Variable Labels. The control then needs 2 columns, one for the names and the other for the labels. Then it will be easy to paste into the variable labels. And there could be an option Copy names to Labels here.

I suggest the operations on the names (and there will be more) could be separate from those on Variable labels. So, when the Variable Labels option is used, then the options relate just to the labels.

I am suggesting Shadrack should check on a possible structure. Then perhaps @N-thony could do the work, once agreed?

@dannyparsons @shadrackkibet how can we be able to rename multiple columns

N-thony commented 2 years ago

A trivial point to include in the next update on frmMain is to change Rename Column... to Rename Columns... It is now also for multiple columns.

We can have Variable Labels as well as Variable Names. R itself isn't very good on Variable Labels, but I would like to encourage their use more in R-Instat. I like what SPSS and other packages encourage, namely relatively short variable names plus variable labels when needed. In some of our examples the variable names are very long (up to 10 words) and no use is made of the variable labels.

There is always a space for them in the column metadata. I have been asking about the possibility of pasting there. But anyway I suggest we also add facilities also into the Rename dialogue, possibly as follows:

In the Single option I suggest we have a group box with label Column (Variable) Label: It is a bit to the left of the current label. Inside there are 3 radio buttons. O Enter - this is the initial default O Copy From Current Name O Copy From New Name Then there is the field for the label, which is made wider than the name fields.

Now trickier is what to do on Multiple? It is possible we need another top button, or we add to the options here. Perhaps there are 2 ordinary radio buttons for Names Only and Include Labels. Or there could be a checkbox, default unchecked, saying Include Variable Labels. The control then needs 2 columns, one for the names and the other for the labels. Then it will be easy to paste into the variable labels. And there could be an option Copy names to Labels here.

I suggest the operations on the names (and there will be more) could be separate from those on Variable labels. So, when the Variable Labels option is used, then the options relate just to the labels.

I am suggesting Shadrack should check on a possible structure. Then perhaps @N-thony could do the work, once agreed?

@shadrackkibet how is it going with this about the structure?

N-thony commented 2 years ago

A trivial point to include in the next update on frmMain is to change Rename Column... to Rename Columns... It is now also for multiple columns. We can have Variable Labels as well as Variable Names. R itself isn't very good on Variable Labels, but I would like to encourage their use more in R-Instat. I like what SPSS and other packages encourage, namely relatively short variable names plus variable labels when needed. In some of our examples the variable names are very long (up to 10 words) and no use is made of the variable labels. There is always a space for them in the column metadata. I have been asking about the possibility of pasting there. But anyway I suggest we also add facilities also into the Rename dialogue, possibly as follows: In the Single option I suggest we have a group box with label Column (Variable) Label: It is a bit to the left of the current label. Inside there are 3 radio buttons. O Enter - this is the initial default O Copy From Current Name O Copy From New Name Then there is the field for the label, which is made wider than the name fields. Now trickier is what to do on Multiple? It is possible we need another top button, or we add to the options here. Perhaps there are 2 ordinary radio buttons for Names Only and Include Labels. Or there could be a checkbox, default unchecked, saying Include Variable Labels. The control then needs 2 columns, one for the names and the other for the labels. Then it will be easy to paste into the variable labels. And there could be an option Copy names to Labels here. I suggest the operations on the names (and there will be more) could be separate from those on Variable labels. So, when the Variable Labels option is used, then the options relate just to the labels. I am suggesting Shadrack should check on a possible structure. Then perhaps @N-thony could do the work, once agreed?

@shadrackkibet how is it going with this about the structure?

@shadrackkibet could we attack this?

shadrackkibet commented 2 years ago

Yes, we can do it depending on its priority. You might want to first check the R code then we can see how we can modify it.

rdstern commented 2 years ago

So here are some more points on this dialogue. a) I suggest the new options on the Single, suggested at the top, still apply.
b) Then I now propose 3 buttons at the top! Single, Multiple as we have now Then 2 more! The third I am calling Rename With c) I suggest the current Multiple is doing operations that should move to the Rename With button. The (new) Multiple would have a multiple receiver and a grid for the existing and new names. There could also be a checkbox (default unchecked) with label "Include Variable Labels". The grid could perhaps be within the dialogue, but perhaps better to have a rename button then opens a separate window with the grid. This could have just 2 columns if just names, for Name and New-name. The default is for New-names to be the same as the current names.
It will look a bit like the grid for the Prepare > Column: Factor > Recode Factor. d) Perhaps it could,like the above also have a number as the first column, because variables do each have an associated number.
e) If the Include labels is checked, then you don't have the new names, but you do have the existing names and Variable Labels, if they exist, and then New Labels. the default is for the new labels to be the same as the original. f) You can paste into the grid. One possibility is to copy the variable names into the labels. (In R sometimes the names are very wide, partly because labels are not used as much as they should be. The rename with option is below.

rdstern commented 2 years ago

The rename_with is essentially a find/replace option using the variable names. In the examples I have seen on stack overflow it uses stringr (which @N-thony has just updated). So, I assume (as with the Prepare > Column: Text > Find/Replace) the default would be simple search, but we would have a regex option - just as there.

@shadrackkibet your Select stuff is essentially a Find, while here we have Find/Replace. (Your select is find on variable names, and - I hope - soon to be extended to possibly include variable labels. Similarly here we have could have find-replace on either names or labels!

I give 2 examples of operations that it should include.

image

Most of these variables start data.questions.. So it would be good to delete that part of the name wherever it occurs. stringr actually has a remove function. That would be quite a common thing.

I even wonder here, whether we need the multiple receiver for this option. If we are looking for a pattern in the names, then we can just have the data frame. If necessary we do a select first!

Here is a second example from the old climatic guide - yundum

image

the names are y61...y99, y00, y01 (so from 1961 to 2001). I'd like to replace these by y1961, y1962, etc, but only up to y1999. So here is an example where I don't want all the columns.

Once we have these working, then it would be good to be able to also have find/replace on the variable labels. I guess we don't need an extra button. Just a checkbox on the rename_with button to choose whether we apply it to the names or the labels.

rdstern commented 2 years ago

@N-thony I suggest the changes needed for the replace_withare simpler than I thought.
You may wish to merge the current work first, or include these options as you proceed. a) The dialogue is fine as it is. Perhaps swap the order of the 2 options there now. b) There are perhaps just 2 further options to add. The first is a further radio button called Abbreviate. This simply use the abbreviate function from base R. It is just what we will often want.
We will often have long names, that - to me are better used as variable labels. So we copy them into the variable labels. Then we shorten the variable names to say 8 characters. That's the abbreviate command.
If chosen then there is an up-down control with label Max: It is perhaps from 1 upwards, with default 8. And a checkbox with label With dot . So, abbreviate with be the function in the rename_with. c) The more general function is a radio button labelled Pattern. (This opens a control to type the pattern. If you check this, then you get some of the same controls as the Prepare > Column:text > Find/Replace. So there is then also Replace By. There will also be the checkbox use Regex, and the Show Regex keyboard. (The special options are not needed, because you are just finding and replacing a pattern in a single word - i.e. the variable name.

This last option (pattern) could usefully also be discussed while @shadrackkibet is with @N-thony . Could it/should it have many of the same options for the pattern that he has in the Select? That might be very powerful, and possibly easy to implement? implement

N-thony commented 2 years ago

@rdstern I'm adding this in PR #7208, so that the review will be done once.

rdstern commented 2 years ago

There is still a further option that we need for the Rename with part of this dialogue. Perhaps @lily could help to puzzle out what is needed, because I am not sure. Then maybe @N-thony could implement.

Some examples are above and I give a further one here. In the Prepare > Column: Numeric > Permute/Sample Rows I use the following example: image

It uses the aircondit data from the boot package. There are then 100 variables called permute, permute1, to permute99. How would I change them to be (say) x, x1, x2, x99 in an efficient way?

rdstern commented 2 years ago

@shadrackkibet are you ok to specify the R commands for the remaining task(s), here. It is "simply" a the code for a find-replace on the variable names that is needed - possibly - as an addition, also on the variable labels. I am not sure whether this will be an additional radio button, or just extra options in the existing one? If not you, then the R needed is a @lilyclements or @dannyparsons question.