work report - record every meaningful action; save and restore session

xhdong-umd commented 7 years ago

It will be nice if the app can record every meaningful action made by user, compile with data and plots into a work report.
I have an even wilder goal which is to generate a reproducible rmarkdown report that can rerun everything later, but that will be a very challenging task and have to wait before other features.

There will be console messages like this

Note I used colored console which is not always supported depend on platform.

Each message start with a timestamp.
These will be saved:
- Basic information about actions
- short summary tables print in console directly, bigger table will be saved as csv file
- plot will be saved as .png file

There will be another html report like this

There is no color except simple markdown formating
Small table will be rendered as html table
plot picture or csv files can have a clickable link (not shown in screenshot now)

In the end user can click a button and download a zip which includes the report, data, csv and plot pictures.

There will be options in app to turn off the report, because we have to save every plot picture after every update, which could be a little bit burden if report was not needed.

@jmcalabrese @chfleming How do you feel about this design?

xhdong-umd commented 7 years ago

There are lots of tricky issues with file related operations.

I found a bug in RStudio Mac that make file preparation run twice.
Sometimes zip can cause infinite loop when targeting file name is same, this is caused by bug above.
I spent some time to adjust theme and style of the report.

Now the zip and download is working well, and the report is looking good. I need to further add the entries for every possible meaningful action, and test in windows(file related operations often have compatibility problems in various platform).

This is how the report look like. There is a floating table of content to help organize the report.

xhdong-umd commented 7 years ago

And here is the control option for report:

xhdong-umd commented 7 years ago

I have recorded every meaningful actions in every page, and tested in shinyapps.io server.

I only saved the report and plot pictures for now. I can save more data as csv or .Rdata if you feel any data should be included in the final zip.
If you clicked several rows in a table in a short time, the app still respond to each click, so there may be update for every click. This is expected since the app cannot predict if user intend to click separately or in one action.
Preview report can open a new browser window for the html report, but this will not work in shinyapps.io hosted version since the browser variable is not configured in server.
- I tried to include the report inside app, but the html could be quite large (10M with lots of plots) and it become very slow. The html style is also not rendered properly inside app.
The shinyapps.io server probably have a different timezone from your local time, so the timestamp inside report and the file names could have different time.

xhdong-umd commented 7 years ago

I'm working on preview report in hosted version. There are some more limitations of this method.

the shinyapps run in https protocol, and you cannot open a local file with file:// protocol for security reasons
shiny app do have access to a folder inside app folder, so I can put the report file there and open it
however this folder is shared by all users/session of app. That means if we use same file name, all users' report will be same file. If we use a specific file name, each session will create a new report and that file is still shared by all users.

I can write some code to clean up the report on session end, but I still don't like the idea of sharing all users' file in app folder. The clean up code could also be skipped if there was some crash in app.

@jmcalabrese @chfleming Maybe it's better to just download the html file, then user can open it. So the button will be download report, and user will save it then open it.

It's also possible that I add some code try to detect the hosted mode

if hosted, download the file
if running locally, open the file directly

xhdong-umd commented 7 years ago

I finished the changes on work report and updated the hosted app:

the on/off switch is moved to first page, local data import box. Logically it apply to all case, not just local data import, but this is the best location we can have in first page. Putting it into a separate box will result a unbalanced layout.

There are two buttons in work report page: generate report and download report.
- User need to generate report first before downloading the report(otherwise there will be an error message). If user generated report once, did some more actions, then he/she need to generate report again to update it.
- If the app is running locally, the report will be opened automatically in a new browser window. If the app is running hosted, no window will be opened.

chfleming commented 7 years ago

That looks like a good solution.

I don't know what @jmcalabrese thinks, but I think it might be useful to also have a "canonical" report that has final data, final selected model, final outputs, without track of all choices made.

xhdong-umd commented 7 years ago

Yes all the intermediate steps may be a little bit overwhelming. Though it'll will be difficult to separate the steps needed for the final result and other steps. Some steps that changed data need to be included, like

time subset
outlier removal
individual selection changes

For the data to be included in the download, what should we include, other than plot png/pdf?

I think it's better to export data as .rds, since you can assign variable name in importing and avoid name conflicts
We can use a list to organize multiple values, for example
- result$telemetry_list, the current selected animal telemetry objects list
- result$models, the model fitting result
- result$selected_models, the selected models subset, this is the rows selected in the model summary table
- result$home_range_list, list of home ranges for selected models
- result$occurrence_list, list of occurrences for selected models

chfleming commented 7 years ago

A canonical structure like that could also serve as a useful format for saving & loading sessions.

xhdong-umd commented 7 years ago

Previously I thought it's difficult to load session because a lot of data involved and we need to maintain consistence when some data is loaded.

Now I realize it's still possible but it need to be in a limited way, only the data, model fitting results, home ranges/occurrence can be restored. These should be the primary target of restore session because they are the time consuming steps.

xhdong-umd commented 7 years ago

To restore a session is still quite tricky. A lot of values are hold in reactive expressions, which will update automatically with input changes, and it's not supposed to be overwritten.

For example I can overwrite home range result anyway, but I also need to update some user input, like the models selected in the model summary table. There is no guarantee on the order of changes on model selected in table and the home range result. It's possible the model table updated later thus triggered the evaluation of reactive expression and calculate the home range again, which just defeat the purpose of saving home range result.

Shiny reactive doesn't provide detailed control of event orders, it's supposed to be automatic updates handled by Shiny. The best way I can think of now is to use a cache mechanism, so the calculation always check cached results before running a costly process. This way I can just update the cache after the session restored. This should be the best approach that don't disturb the reactive logic.

xhdong-umd commented 7 years ago

The cache mechanism worked in my test but didn't work in shiny app. For some reason the hash value of a function changed after parallel model fitting. I have reproduced the bug and reported to the author of digest.

Parallel code really is difficult when something is wrong...

xhdong-umd commented 7 years ago

It's difficult to fix the problem on digest (Shiny + parallel can be tricky). I found I can work around the issue by creating a wrapper function to move function object out of parameters.

xhdong-umd commented 7 years ago

With the work around I have made the cache mechanism work in app. If fit models/home range/occurrence was run with exactly same input again in one session, the later run will be finished instantly. So we can save session data include the cache data, and restore session will also restore cache, then these process can be finished quickly.

There are some tricky issues with data update though, since there are more source of truth for data -- previously it's just user input to determine the output, now the restoration of session can also update the data. This kind of multiple source of truth make the reactives in Shiny more complex and need more test to make sure everything works in every operation combination.

xhdong-umd commented 7 years ago

Previously I thought it's almost impossible to restore session given the interactive nature of the app.
Later I realized I can restore some data without breaking the data consistency if I limit the data to be saved, also restore some data naturally by imitating the user action:
- Some dynamic values should not be overridden with restored values, like the current selected individuals data which depend on row selection of the data summary table. So I tried to select the individual rows in data summary table programmatically, and restored the cache, so the calculation on models, home ranges still happen as normal app logic (instead of just assign the saved value) but those should finish instantly because of cache.
- However this didn't work. Shiny DT have some methods to select rows in a table programmatically, but that was some javascript calls will only be executed correctly after the table is rendered. The restoration process happened much faster than the app UI updates, so the row selection call was executed before the table is updated, which have no effect.
- I tried various methods to delay the row selection to no avail.
- My conclusion is Shiny didn't provide full control of the UI so you can imitate every user interaction programmatically, especially with full control of execution order.
In the end, the save/load session only work to limited extent:
- User data can be saved and loaded.
- The current selected individuals cannot be reproduced programmatically. User have to select the same individuals manually, maybe with help of the work report which documented which individuals was selected. Same also apply to the current selected models.
- With same individuals/models selected, all the time consuming calculation can be finished instantly because of restored cache.

xhdong-umd commented 7 years ago

The current UI for report/session is as follows:

It make senses to move the Load Session button to first page of import. Though that will make the local import box too busy, and the report box unbalanced.

Another option is to move both save session and load session to first page, which is not perfect either.

xhdong-umd commented 7 years ago

There is a bug with zip package I used (R utils::zip is not platform independent), which created an invalid zip when the folder to be compressed is empty. I have raised the issue with the package author. In the same time there is no real harm about this, just an error message in console when restoring this zip.

xhdong-umd commented 7 years ago

I have updated the repo and the web app. The work report and save/load session are finished for now.

I'll continue the work on the maps.

xhdong-umd commented 7 years ago

I may need to further test the session restoration. These file related operations often have problems in windows/Linux that need separate tests.

xhdong-umd commented 7 years ago

Turned out that R internal unzip ( the default method under windows) have problem unpacking zip created by zip::zip. The external tool unzip.exe have no problem with it.

xhdong-umd commented 7 years ago

Interestingly, I tested again in windows 10 VM, Linux VM, hosted mode, and there is no problem so far. The cached data can be loaded and used properly.

One thing to note is that the downloaded file can take some space. It's common to have about 10M zip session data.

I'll move to work on map now, and test with my home pc windows later.

xhdong-umd commented 7 years ago

I realized that to view report don't need two steps.

Previously I tried two steps because I want to open a link in hosted mode. That will need the report file generated before the link is ready. However a link in app cannot be session specific, that means we cannot separate different user's report files in a safe way.

So we have to download the report in hosted mode, but that doesn't need a explicit generate report step.

Now there will be just one button depend on app mode:

preview report in local mode, which will open the report automatically like the first version.
download report in hosted mode, which will just download the report.

I also moved Load Session button to import page which is more logical, since the buttons in report box can be arranged in a balanced way now.

xhdong-umd commented 7 years ago

@jmcalabrese @chfleming I think the save/load session name can be changed to save/load cache, since the only data saved and restored are the input data and calculation caches. Restore session may give user an impression that more user interactions can be restored, but that is not practical.

ctmm-initiative / ctmmweb

work report - record every meaningful action; save and restore session #34