IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Adding comments to a data frame #2300

Open rdstern opened 7 years ago

rdstern commented 7 years ago

We currently have the Instat object as a set of linked data frames. There is control information at the data frame level and also at the column-within-data-frame level. They each have a grid in the View menu to see the meta-data (not sure about the links yet - I assume more dialogues are needed. I also suggest:

1) (This is not needed for the 0.2 milestone) We also have meta-data that can be added about each object. That might include the Instat Object itself, though that might be dealt with as a spacial case. But we have calculations, filters, graphs, models to manage. I think they are all within a data frame. What additional information is needed?

2) For version 0.2 The main extra thing after discussion with David is roughly the equivalent of being able to comment on a particular cell in Excel. This is different, because I don't want to have additional information on every cell, just on particular ones. I would also like to be able to comment on particular rows.

One problem is that we are not using row names much in R-Instat. We need here to be able to specify a row - the way we will do this is to only allow this feature for data frames that have a key column (or set of columns)

Among the uses I see for this are the following:

1) In quality control we will have rows (or cells) where data look odd. We could comment on how they seem odd. 2) We could (like Genstat have a system of being able to have temporary missing values. We could then put the actual value in the "comment" and hence be able to re-instate it later. 3) We might change values we think we are "correcting", but could also then keep the original value. 4) Or we might simply want to make a comment.

The obvious layout for our set of comments is (of course) a data frame! This is a "Comment" data frame linked to the data in the main data frame.

The fields (columns) in this data frame could be as follows:

1) Key field(s) which provide the link to the main data frame. 2) Column name - blank (missing) if a row is selected. 3) Column value - blank (missing) if a row is selected. 4) Comment - text field.

The corresponding dialogue will be in the Keys and Links menu. The main (possibly the only way initially) to add a comment is through the right-click. So we add "Add Comment" to the right-click when in a cell in the grid, and also when on the left hand side at a particular row.

rdstern commented 7 years ago

After discussion with David today, we think this should be considered as soon as possible. This is because it will be used in our quality control for the climatic data and it will be good to be able to explain the proposed "system".

One other feature that is similar to the links stuff is that we will sometimes want to add our own comments manually, and on other occasions they would come indirectly, from the corresponding quality control checks.

rdstern commented 7 years ago

We now have a simple system that fits with other parts of R-Instat. You can only have row or cell notes if the data frame with the data has a key field.

The main Organise menu remains with Organise > Keys and Links, There is an additional dialogue called Add Notes in that menu. (We don't need to view or delete notes

Adding a row or cell Note makes (and adds to) a linked data frame. It is a one-to-many link, because you could have more than one link from a single row of data (or from one or more cells in a given row).

You right-click in a cell or in a row to add a note. This takes you to a simple dialogue with the row and column already completed. It might even be a grid showing that row (or cell)? There would then be another column in the grid for the Comment.

You could perhaps select multiple rows and then right-click)? Then it shows the set of rows, though you would then need a simple system so the same note could be added to all these rows.

Having a linked data frame in this way is general and neat. Once we have the quality control there will be other dialogues that add notes to rows or cells.

One difference from (say) the summarise dialogue is that summarise usually adds columns to an existing linked data frame. Here we are more going to be adding (or managing) rows.

The dialogue could have the data frames - it doesn't need the columns. Ideally it is only those data frames with key columns.

On the right there is a label with a multiple receiver called "Key Column(s)". Once a data frame is selected this is filled automatically.

The next fields are a small grid. The first column(s) are for the key variables. Then the column is for the name of the selected column, and the row shows the value. (This column in the little grid is omitted if only rows are selected.

These are completed automatically if the action started with the right click. Ideally, if the action starts with the dialogue, then (like Excel) you could choose a place in the (full) grid from the open dialogue.

The last field is the comment itself. Ideally it would "remember" the previous comments, so these could be copied if necessary.

That's it.

rdstern commented 7 years ago

One aspect that could be good to consider is how a data frame with comments will be displayed. Excel has this feature as a mark in the top right-hand corner. I wonder whether something could be suggested to the reogrid people? Perhaps "we" will have to do it?

Added in 2021. We ignore this feature - at least for now. @volloholic says if this is included, then the issue becomes more complicated. It isn't needed.

rdstern commented 7 years ago

After discussion with Danny we now have the structure of the Add Comment dialogue.
1) the blank dialogue is already there. It is in Prepare > Add Keys and Links menu. 2) The right-click on the rows or the cells also already goew to this dialogue - so that's great.
3) At the top are (David's favourite) radio buttons. The first is O Row and the second is O Cell 4) If Row is selected, then there is a selector for just the data frame. If Cell is selected, then there is the usual selector showing data frame and columns for that data frame. 5) The only data frames visible are those where a key has been defined. (If that isn't easy now, then we'll add that condition later. 6) There is a receiver on the rhs with the label Row. It shows the name of the key columns, and then the selected row.
For example if the key column is date, it might contain: Date 12/3/2012 If the key columns are Year, Month, Day then it shows: Year Month Day 2012 3 12

(If there are 2 sets of key columns in a data frame (rare) then it just uses one of them.

This is completed automatically if the right-click is used.
7) If Cell is selected then there is another receiver with the label Column, and that is filled automatically if the right click was used. 8) Finally there is a Multiline textbox with the label Comment. OK is enabled when this is not empty.

lilyclements commented 7 years ago

@Lunalo I believe you were working on this,how is this coming on?

dannyparsons commented 7 years ago

Where are things on this?

volloholic commented 7 years ago

@Lunalo please respond with an update even if it is just there is no progress and we should postpone this issue to 0.9!

Lunalo commented 7 years ago

I finished the design of thus dialog and Danny had agreed to write a method for it . I think he just forgot to. I can discuss with him on the possibilities of implementing it since it is not huge

maxwellfundi commented 7 years ago

@Lunalo has this reached a stage that can be taken further? Whats is the state of the method that is needed here @dannyparsons . We would like to have this feature in the next release

Lunalo commented 7 years ago

@maxwellfundi

I still need to discuss with @dannyparsons on how this should be implemented and then I can Start writing r code

dannyparsons commented 7 years ago

This is waiting for me to create the R method needed. I will attempt to do this soon unless more urgent 0.4.1 issues take priority.

shadrackkibet commented 7 years ago

@dannyparsons did you create an R method for this?

maxwellfundi commented 7 years ago

@dannyparsons How about the method for this dialog? How far has it gone or its yet to be created?

dannyparsons commented 7 years ago

Not yet created, I don't think I have time for this yet, I will move it to a later milestone. Thanks for the reminder.

maxwellfundi commented 6 years ago

@dannyparsons how is this going? Would you have some time to have this done by the end of the year? Once you create the method the rest would be easy for us to do.

dannyparsons commented 6 years ago

Not sure but let's optimistically keep it for the next release.

rdstern commented 6 years ago

R code is implemented for this so next step is to implement in grid and dialogs.

rdstern commented 3 years ago

@N-thony this has been dormant for 4 years! But it hasn't gone away!

I hope you can take it up and actually complete it. You will need help from @volloholic and/or @dannyparsons. Start with @volloholic

N-thony commented 3 years ago

@volloholic could you please clarify the current stage of this issue? Thanks

rdstern commented 3 years ago

Please check above for any further ideas: The dialogue is there already, see Prepare > Keys and Links > Add Comment The default (as now) is Cell, then (as now), Row, then Column, then Data Frame. We accept limitations in the initial "system" for Comments. a) We don't have anything special in the originating data frame that shows there is a comment. (That's the red triangle in Excel. It isn't so important for us as you don't even see all of a data frame in R-Instat.) b) We add a right-click to the current Column right-click. It is a single entry, possibly in a line by itself and perhaps just below the Levels-Labels. If a column is selected then it is a column comment. If a cell is selected, then it is for that cell. There are probably buttons at the top of the dialogue with Cell, Column, Row, Data Frame as the four options. c) If Data Frame is selected, then just the Data frame control is shown. In addition there is a new Comment: control. This is a multiline field - similar to the field in PICSA > PICSA Options see below:

image

d) If Column selected, then the "usual" data selector control with Data Frame and the Variables is chosen. Plus a single receiver. If right-click was used, then that variable is selected and put into the receiver. We have a number of right-click dialogues that do this, for example right-click > Duplicate Column

image

e) If Row (or Cell) are chosen, then we have a new section with the label Row and perhaps 4 controls inside. First has with label Key, then Value 1, Value 2 and possibly Value 3. If there is no key, then ok can't be enabled. (Our problem is that row numbers are not great for us. I would be happy with row numbers for now, instead, if we can get away with them (ask @volloholic ). But we assume that these comments need key fields. So I am suggesting the row controls show the values of the key fields.)

This will bring in another limitation of the initial system. I explain with the important initial use of the Comments, namely for the climatic data. Then there are one or two key fields, namely the station name and the date. In my example the key is called namedate, i.e. the names of the 2 key fields. So the Key contains the Key namedate. Then Key 1 might contain Agadez and Key 2 has 14/06/2011.

If this is the first Comment, then perhaps the sheet is made and the first row is completed. The name of the data frame might be .comment.? or _comment_

The fields are fixed. They are perhaps 1._frame_

  1. _column_
  2. _key_
  3. _value1_
  4. _value2_
  5. _value3_ (Though in the climatic example we might just have 2 value fields - see below.)
  6. _comment_

If the data frame already exists, then the next row is added.

There is one other limitation that we accept for now. The comment data frame assumes the key fields have the same structure from all data frames that provide comments in a given data book. So in the climatic example the first field is the name, so probably a factor and the second is a date. If comments are added from more sheets then this structure is the same. (The alternative is to allow any structure, but then the value fields m ight become character types to accommodate anything.)