data_edit crashes when synchronizing edited filtered data

jon23cooper commented 2 years ago

When I load a data set into data_edit, create a filter and make an edit, if I then click synchronize data_edit crashes emiting 2 warnings: Warning: non-unique values when setting 'row.names' Warning: Error in .rowNamesDF<-: duplicate 'row.names' are not allowed

Making an edit and synchronizing without filtering does not cause a crash.

DillonHammill commented 2 years ago

@jon23cooper, this seems to work fine on my end. Did you add/remove rows before hitting the sync button? I haven't yet added support for this. Things get a bit tricky when rows/columns are removed in filtered subsets - should these be added/removed from the master copy when the sync button is hit? I don't yet know the best answer to this question.

jon23cooper commented 2 years ago

@DillonHammill I have not added or removed rows or columns. I have tried locally and on R studio cloud. d <- data.frame("col" = c("a", "b")) data_edit(d)

If I filter on col equals a and change a to a1 and synchronize no problem If I filter on col equals b and change b to b1 and synchronize I get a crash Warning: non-unique value when setting 'row.names': ‘1’ Warning: Error in .rowNamesDF<-: duplicate 'row.names' are not allowed.

I think the problem is when the filter is done the rownames are changed to reflect the new data frame, so when you filter row "2", it then becomes row "1", so when you try to sync, it attempts to rename rowname["2"] to "1", but "1" already exists and so it crashes. If I remove the part of dataSync that attempts to change the rownames then the problem goes away.

DillonHammill commented 2 years ago

OK I see thanks for the example. I will take a closer look tomorrow.

DillonHammill commented 2 years ago

Thanks for reporting this issue @jon23cooper! It proved to be a bit more complicated than I would have liked because I have to do I few tricks with the rownames to get everything to work properly. I have just put some notes below in case I have to re-visit this issue in the future - feel free to skip to end.

Basically, numeric rownames are handled differently to character rownames within DataEditR:

numeric rownames are passed directly to rhandsontable and displayed as such. At the moment I have actually turned off the ability to edit these values - it is common for users to accidentally enter a duplicate row indices which causes the application to crash without warning. I think it is better if we let DataEditR take care of updating these automatically - users can easily add a new column to serve as more elaborate rownames if they so desire.
character rownames are actually moved inside the table before passing the data to rhandsontable, this means that they are mostly treated like any other columns in the dataset, with the exception that duplicates are not permitted. Once the data has been edited these rownames are then moved back into the rownames() slot of the data. The main reason why this is necessary is because the version of Handsontbale that we are using does not allow automatic resizing of rownames - which leads to many characters being cutoff. There seems to be enough space there for row indices but I haven't pushed this beyond 3 digits - I might need to add a bit more space if users plan to use DataEditR with datasets with thousands of rows.

OK, so what needed changing? Most of the updates I made try to ensure that if you are using numeric rownames these should always be retained. For example, if you have a dataset with 20 rows (rownames = 1:20) and you subset the data to select rows 6:8, then row indices in the displayed subsetted data should be 6:8 as well - indicating the location of those rows in the master copy of the data. The current version of DataEditR does not attempt to retain this information, instead it simply indexes the subsetted data starting at 1 - i.e. the row indices would be set to 1:3 in the above case. This was a bit tricky to implement as we need to know the indices of the master copy which is not available within the dataEdit module. To overcome this I have added a row_index argument to the dataEdit so we can manually specify the starting index for new rows (i.e. starting at the number of rows in the master copy + 1).

The story is different for character rownames as they themselves serve as the row indices and so we are not so worried about ensuring that the row indices point to locations in the master copy. In the future it is likely that I will implement this anyway because it will help keep track of what data you are editing - for now I will stick with the current behaviour.

Since there were considerable changes required to address this issue, please can you pull down the latest version of DataEditR (v0.1.5) from GitHub, take it for a spin and let me know if you encounter any issues.

devtools::install_github("DillonHammill/DataEditR")

jon23cooper commented 2 years ago

Thanks Dillon, I have downloaded and installed the new version from GitHub and that is working for me. I'm guessing there must be a reason you don't just treat numeric rownames the same way as character rownames. I am using it with data containing a few hundred thousand rows, and rownames are visible though you are losing a tiny bit of the last digit on 6-figure numbers.

DillonHammill commented 2 years ago

That's great! Wow that is a pretty large dataset! When I get time I will see if I can add a bit more space for the row indices so that they render nicely. The UI gets a bit confusing if we import the numeric rownames into the table. Feel free to play around a bit more and close this issue when you are happy that it is fixed. Let me know if you encounter any issues. I will hold off on submitting this update to CRAN until I have had more time to test it myself.

DillonHammill / DataEditR

data_edit crashes when synchronizing edited filtered data #41