IMCR-Hackathon / datapie

Data Package Interface for Evaluation ("Easy as pie!")
https://imcr-hackathon.github.io/datapie/
MIT License
3 stars 2 forks source link

write quick start guide for shiny app #10

Closed sheilasaia closed 5 years ago

clnsmth commented 5 years ago

The in-depth guide will be a vignette. I've initialized __/vignettes/interactive_report.Rmd__ for this purpose.

CoastalPlainSoils commented 5 years ago

Sure I will do my best based upon what we discussed and have put together so far.

CoastalPlainSoils commented 5 years ago

As mentioned in the call today, I will look at Li's email (which I have received) and run the GUI to draft the quick start guide and as discussed in the call today, I will work with Shelia, Li, and Jason with receiving feedback on the quick start guide. Enclosed below here is the vignette text draft I sent to Colin earlier today.

I know that in my text below I do need to clarify if this will be a web-based application, R package or both?

Please feel free to provide any comments on this text.

Overview

Welcome to the Environmental Data Initiative’s (EDI) interactive web application for data description and exploration. This application was developed through the 2019 Hackathon event which occurred June 9 to 13 in Albuquerque, New Mexico. This web application was released on XXXX, 2019

This package allows users to explore the suitability of a data file or package for further inquiry by utilizing R, an open source statistical software program, and the R Shiny application. Additional R package are utilized to make this application possible and are identified further on in this overview. Users identify data files or packages of interest by passing a digital object identifier (DOI) or data file to a graphical user interface (GUI). The user can then browse summary reports and generate exploratory plots.

The goal of the 2019 Hackathon was to improve methods to visualize data. In today’s data driven and data generating world, massive amounts of data are generated and are accessible with only a small fraction being interpreted. Here, users can review data from an existing site or can supply a dataset and review the contents in an efficient manner. By creating such a tool, EDI and the Hackathon participants hope to increase the amount of existing datasets being reviewed and interpreted with the hope that this application contributes to the overall progression of science. This work follows and adheres to the Findable Accessible Interoperable Reusable (FAIR) initiative.

The application has two main functions: a static report and an interactive application. With the static report, the application will run through a given dataset and create a summary table of the provided data. It will create graphs of the identified variables and graphs of the NA distributions (if applicable). With the interactive portion of the application, the user can identify the variables to be plotted in the type of graphical plot desired so that they may interpret identified variables and create a desirable visual analysis of a given dataset or DOI.

This project assists researchers and other data users who wish to reuse existing data packages that are archived on DataOne member notes. This R shiny application (built off of the framework provided by ggplotgui) 1) downloads identified DOIs registered on DataOne member notes, 2) reads data into a light weight viewer, 3) provides summary statistics and basic graphics describing the data package, and 4) generates a more robust report describing the data. This is not intended to replace a full analysis in R or comparable statistical packages, however, is instead intended to allow the user to quickly access if the data is suitable for their needs.

Note: access to data packages on DataOne member notes is provided through the package metajam.

This application would not have been possible without the Hackathon 2019 event the hard work and dedication of the following participants: Colin A. Smith Alesia Hallmark, Li Kui, Jason J. Mercer, An T. Nguyen, John H. Porter, Shelia Saia, Kathe Todd-Brown, Kristin Vanderbuilt, and Jocelyn Wardrup.

In addition, we wish to credit the author of a pre-existing application, Gert Stulp, that application was edited (with Dr. Stulp’s permission) to meet the purpose and satisfy the goals of this project. That pre-existing application is accessible via this link: https://site.shinyserver.dck.gmw.rug.nl/ggplotgui/

Application information: This application runs on R version XXXX, and utilizes the following packages: XXXXXXX, XXXXXXX, XXXXXXX, XXXXXXX, XXXXXXX, XXXXXXX, XXXXXXX.

Please credit this work as the following when appropriate: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

If you have any questions regarding the application, find any areas for improvement, or wish to provide feedback, please contact Colin A. Smith (csmith.tar@gmail.com) or Kristin Vanderbilt (krvander@fiu.edu).

The package attempts to address some common shortcomings of data formats to aid suitability evaluation. Data is highly diverse and we provide no guarantee this will actually work for a given data package, but it’s worth a shot!

Static Report

The static report is a brief snippet of a given dataset. The static report provides general information such as min, max, mean, 1st quartile and 3rd quartiles. It further provides the number of observations per variable and the number of NA’s for each variable. The static report provides graphs for the variables. The static report is limited to XXXXX variables.

Please note some static reports can take some time to be generated. The web interface will indicate the progression of the generation of the report.

The code for this interactive interface identifies several different formats of geographic coordinates and several different formats of dates/times. Please note that not all formats will be recognized.

Interactive Report

The interactive report allows the user to choose variables and plot styles. Through the XXXXXXXX tab, a user can select boxplot, density, dot + error, dotplot, histogram, scatter, or violin plots. X and y variables can be selected, legend added, font changed, and titles created.

Similar to the static report, the code for this interactive interface identifies several different formats of geographic coordinates and several different formats of dates/times. Please note that not all formats will be recognized.

wetlandscapes commented 5 years ago

Great information! Thanks for organizing @CoastalPlainSoils.

I have a couple of organizational comments, but these are just opinions, so do with them what you will:

  1. Some of the attribution information (e.g., about EDI, making the app) should be placed else where. There are two tabs in the gui: "Help" and "About". The attribution information should probably go in the "About" tab, while a lot of the quick-start info should go in the "Help" tab. We should probably change the name of the "Help" tab to "Quick Start" (or similar), too.
  2. Rather than focus on the main functionalities of the app, I will suggest that we focus on the tabs of the app, and describe the functionalities therein. In this context we would have 7 sections in the Quick Start document. Below is an example of how those tabs could be organized, as well as a couple of ideas related to the kinds of information the sections could contain.

Raw Data

Summary Report

The interactive report allows the user to choose variables and plot styles. Through the XXXXXXXX tab, a user can select boxplot, density, dot + error, dotplot, histogram, scatter, or violin plots. X and y variables can be selected, legend added, font changed, and titles created.

Similar to the static report, the code for this interactive interface identifies several different formats of geographic coordinates and several different formats of dates/times. Please note that not all formats will be recognized.

Plot

Interactive Plot

R-code

Help (or Quick Start)

About

clnsmth commented 5 years ago

Looks great @CoastalPlainSoils and I second @wetlandscapes comments.

Consider creating this documentation in /datapie/vignettes as an .Rmd file, which outputs .html that can be referenced by the GUI.

atn38 commented 5 years ago

Summary Report

  • What is contained on this page?
  • Static report functionality (taken from the original explanation):

The static report is a brief snippet of a given data table within the data package. The static report provides general table-level information such as number of observations and NAs per variable, min, max, mean, 1st quartile and 3rd quartiles for numeric variables and number of levels and distribution of levels for categorical variables The static report provides appropriate plots to assess data availability and summary. The static report is limited to XXXXX variables. Please note some static reports can take some time to be generated. The web interface will indicate the progression of the generation of the report. The processing for this report identifies several different formats of geographic coordinates and several different formats of dates/times. Please note that not all formats will be recognized.

@CoastalPlainSoils I edited the section on static reports (edited in bold). Note that functionalities mentioned in italicized text aren't implemented yet 🤷‍♂.

clnsmth commented 5 years ago

Hi @CoastalPlainSoils. Please add the quick start guide to the quick_start_guide.Rmd file in the package directory vignettes. This will enable .html content rendering that can be easily added to the UI and website.

CoastalPlainSoils commented 5 years ago

Thank you for your comments. I am working on this now. I had some issues trying to figure out how to run the app to complete the documentation. I have the app running now, success! I have finished the "About" tab information and I am working on the Quick Start guide..... I liked the idea of organizing it into the tabs, thank you Jason, and thank you for your edits An. Colin I will try to find that file and put everything in correctly, I might be asking you for help! I'll keep you posted.

CoastalPlainSoils commented 5 years ago

Colin: that exact file you referenced is not in the folder. I looked at one of the vignettes documents and I think I might need some guidance to make sure I insert the information correctly.

CoastalPlainSoils commented 5 years ago

In the mean time here is what I have completed for the About tab. Please look over and let me know of any edits, errors, etc.

About Tab

EDI Data Viewer _“datapie” (because it is so pleasant!)

Information Page

Background:

Welcome to the Environmental Data Initiative’s (EDI) interactive web application for data description and exploration. This application was developed through the 2019 Hackathon event which occurred June 9 to 13 in Albuquerque, New Mexico. This web application was released on XXXX, 2019

Purpose:

This package (“datapie”) and web interface application allows users to explore the suitability of a data file or package for further inquiry by utilizing R, an open source statistical software program, and the R Shiny application. Additional R packages are utilized to make this application possible and are identified in this overview. Users identify data files or packages of interest by passing a digital object identifier (DOI) or data file to a graphical user interface (GUI). The user can then browse summary reports and generate exploratory plots.

The goal of the 2019 Hackathon was to improve methods to visualize data. In today’s data driven and data generating world, massive amounts of data are generated and are accessible with only a small fraction being interpreted. Here, users can review data from an existing site or can supply a dataset and review the contents in an efficient manner. By creating such a tool, EDI and the Hackathon participants hope to increase the amount of existing datasets being reviewed and interpreted with the hope that this application contributes to the overall progression of science. This work follows and adheres to the Findable Accessible Interoperable Reusable (FAIR) initiative.

Use of this Application/Package:

The application has two main functions: a static report (summary report tab) and an interactive application (interactive plot tab). With the static report, the application will run through a given dataset and create a summary table of the provided data. It will create graphs of the identified variables and graphs of the NA distributions (if applicable). With the interactive portion of the application, the user can identify the variables to be plotted in the type of graphical plot desired so that they may interpret identified variables and create a desirable visual analysis of a given dataset or DOI.

This project assists researchers and other data users who wish to reuse existing data. This R shiny application (built off of the framework provided by ggplotgui) 1) downloads and supports identified data from DOIs registered on DataOne, EDI, Long Term Ecological Research Network (LTER), and R package “metajam” compatible data; 2) reads data into a light weight viewer; 3) provides summary statistics and basic graphics describing the data package, and; 4) generates a more robust report describing the data. This is not intended to replace a full analysis in R or comparable statistical packages, however, is instead intended to allow the user to quickly access if the data is suitable for their needs.

Acknowledgements:

This application would not have been possible without the Hackathon 2019 event the hard work and dedication of the following participants: Colin A. Smith, Alesia Hallmark, Li Kui, Jason J. Mercer, An T. Nguyen, John H. Porter, Shelia Saia, Kathe Todd-Brown, Kristin Vanderbuilt, and Jocelyn Wardrup.

In addition, we wish to credit the author of a pre-existing application, Gert Stulp, whose application was edited (with Dr. Stulp’s permission) to meet the purpose and satisfy the goals of this project. That pre-existing application is accessible via this link: https://site.shinyserver.dck.gmw.rug.nl/ggplotgui/

Further thanks to Wilmer Joling for setting up the website which is based on the magical but incomprehensible docker. Thanks to Hadley Wicham for making such good packages (and open access books describing them), that allow even low-skilled and low-talented programmers to be able to contribute to R.

Application information: This application runs on R version 3.3.2 , and utilizes the following packages: ggplot2, Shiny, stringr, plotly, readr, readxl, haven, and RColorBrewer.

Citation:

Please credit this work as the following when appropriate: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Questions:

If you have any questions regarding the application, find any areas for improvement, or wish to provide feedback, please contact Colin A. Smith (csmith.tar@gmail.com) or Kristin Vanderbilt (krvander@fiu.edu).

Closing Remarks:

The package attempts to address some common shortcomings of data formats to aid suitability evaluation. Data is highly diverse and we provide no guarantee this will actually work for a given data package, but it’s worth a shot!

Thank you for utilizing this product and we hope it assists you in the progression of beneficial science and interpretation of data. Cheers, and enjoy the datapie!

CoastalPlainSoils commented 5 years ago

Here is the Quick Start Guide I have created based upon the tabs and the input received. Please note I will be incorporating this into a .Rmd file, but I am waiting for further instruction before doing so. I tried to play around with the app, but I could not really get the plot or interactive plot to work with the example dataset or the dataset I provided. So when I am able to visualize that I can expand on those sections.

Notes to myself or things to clarify/edit are in all caps and missing things are XXXXXX's.... Again if you have comments, questions, expansions on these, feel free to let me know and I will incorporate them.

I hope both the Information/About and this Quick Start Guide is what everyone was envisioning....

Quick Start Guide Tab

EDI Data Viewer “datapie” (because it is so pleasant!)

Quick Start Guide

Not sure how this thing works or need clarification about a particular tab/process?

You have come to the right place! Here, we go over what you need to know to process your data and visualize a given dataset. The information herein is organized according to the tabs on this data viewer: “Raw Data”, “Summary Report”, “Plot”, “Interactive Plot”, “R-code”.

If you do not find the answer you are looking for on this page, please visit the “About” tab and refer to the package/viewer contacts.

Raw Data:

What data can I view on here? Any dataset that is supported through the R package “metajam” Any dataset from the data archive services of: DataOne Environmental Data Initiative (EDI) Long Term Ecological Research Network (LTER)

Or…. Upload your own dataset and see what happens! It just might work! We don’t know the format, etc. of your dataset so it is hard for us to say if it will work or not. Generally, the dataset should be formatted with the column names at the top followed by the data in rows for it to work. Accepted file formats are detailed below.

How it works:

On the left side of the application are three options: “Load sample data”, “Fetch data from DOI”, or “Upload text file”.

Load sample data - allows a user to download the example dataset provided.

Fetch Data from DOI - a user enters a given DOI (Digital object identifier) and clicks “Fetch Data”.

Upload text file - by clicking “Browse” one can find and upload one of five file types: text (csv), Excel, SPSS, Strata, or SAS. Then select the delimiter (how the data is formatted within that file) based upon four options: Semicolon, Tab, Comma, or Space. Select “Submit datafile” and the data file will appear on the right. If your uploaded datafile does not appear the way you wish, you may need to create a copy of the file and edit headers or delete rows in the top of the document that may contain study information, of which is important, however this application is not designed to know how many rows to skip in any one dataset. The columns at the top of the page are from the top row of the document uploaded.

One can expand or decrease the amount of rows shown on a given page with the options of 10, 25, 50 and 100.

Summary Report:

Based upon the data uploaded, click “Generate report”. On the right side of the page a summary report of the provided data will be appear.

The summary report will display a brief snippet of a given data table within the data package. This report provides general table-level information such as number of observations and NA’s per variable, min, max, mean, 1st quartile and 3rd quartiles for numeric variables and number of levels and distribution of levels for categorical variables. REVIEW TO MAKE SURE CORRECT… This report is limited to XXXXX variables.

Please note some reports can take some time to be generated. The web interface will indicate the progression of the generation of the report. - IS THIS CORRECT?

The processing for this report identifies several different formats of geographic coordinates and several different formats of dates/times. Please note that not all formats will be recognized.

If you are satisfied with the generated report, click “Download report (HTML)”. This HTML can be saved or printed to PDF. Images within the HTML can saved individually by the user.

Plot:

This portion of the application allows the user to create and edit plots in an easy to use format with the option to save the plot created.

In the central portion of the page, a static plot is generated. On the left side of the page, the user can select the type of plot (boxplot, histogram, scatter) and the x and y variables.

Based upon what plot is selected, the user can choose between a variety of options. EXPLAIN OPTIONS IN MORE DETAIL…..

On the right side of the page the user is given the option to change the aesthetics of the plot. Tabs shown are: “Text”, “Theme”, “Legend”, and “Size”. Here the labels on the graph’s axes can be changed, a title may be added, font sizes can be adjusted, text can be rotated, colors selected, gridlines removed, legend edited, and the size of chart adjusted.

The user is given the option to download a pdf or tiff file format of the figure.

Interactive Plot:

The interactive plot gives the user the same options as the plot, however is different in that….. XXXXXXXXXXXXXXX

Please note datasets with greater than 100,000 points will take longer to plot and should be avoided by users. We recommend dividing up your datasets if this is the case.

R-code:

The R-code is enclosed for the purpose of XXXXXXXXXXXXXXXXXXXX

clnsmth commented 5 years ago

@CoastalPlainSoils, quick_start_guide.Rmd is in the /vignettes directory of the development branch, where contributions to the project are being made (see #53 for guidance).

Send me an email if you have any questions.

sheilasaia commented 5 years ago

@CoastalPlainSoils do you mind if I use the text you wrote above about the hackathon for the "About" tab? Specifically, the post you made on July 11? I just noticed that the "About" tab hasn't been updated and isn't assigned to anyone. I will add it manually to the app for now but we can talk about how to streamline this with a markdown file, later (maybe beta release).

clnsmth commented 5 years ago

The quick start guide (above) has been added to the vignette quick_start_guide.Rmd. Now it needs to be integrated into the UI Help tab.

clnsmth commented 5 years ago

Working on this now in the fix_10 branch.

clnsmth commented 5 years ago

In addition to the primary scope of this issue, theAbout tab contents will be relocated to the repo level README.md and the package level DESCRIPTION file.

clnsmth commented 5 years ago

Merged branch fix_10 into the development branch (see e20f5a99a493ffdcb326369a663d49209f427677).