Contributions to ProjectTemplate documentation

MattForshaw commented 6 years ago

Hi everyone,

Thank you for your ongoing hard work on ProjectTemplate.

At the end of the Getting Started guide is the following sentence:

In a future piece of documentation, we’ll describe some of the more advanced features that ProjectTemplate offers.

I would like to be able to contribute to ProjectTemple by building additional documentation, but before embarking on this, I wanted to check if there is a wish-list anywhere for the advanced features you would like documenting next?

Matt Forshaw, Lecturer in Data Science, Newcastle University

KentonWhite commented 6 years ago

Thanks @MattForshaw for your offer. We do love having documentation contributions. There isn't a list of features that are prioritized for documentation. If you could make a list of what features you think should be added to the documentation and then we can discuss priority. Is that fair?

rsangole commented 6 years ago

@MattForshaw I use project template almost everyday. To that end, I can do my part and contribute in the documentation as well. Can follow your lead if you have a prioritized list of features that need documentation.

Hugovdberg commented 6 years ago

I just updated some of the changes I had in mind for the documentation on the website. I think the following sections still need work:

[ ] Expand "Introduction"
- [ ] What is ProjectTemplate
- [ ] How does ProjectTemplate work
- [ ] Explain convention over configuration philosophy
- [ ] Minimal usage example
- [ ] Basic installation, plus link to separate page
- [ ] What is ProjectTemplate NOT
- [ ] Integrate "Building packages"
- [ ] Remove separate page for building packages
[ ] Update "Getting started":
- [ ] Remove preference of text editor, just keep it neutral (or perhaps move it to a new section with useful tools).
- [ ] Instruct to download the file directly into the data directory: download.file('http://projecttemplate.net/letters.csv.bz2', 'data/letters.csv.bz2') would be a lot clearer I think and prevents dependencies on operating systems.
- [ ] Update ddply example to equivalent dplyr example
[ ] Update "Mastering ProjectTemplate":
- [ ] Instruct to download the philapd.db file directly
- [ ] Combine instructions for SQL databases with those on "Supported File Formats"
- [ ] Move to separate page "SQL databases"
- [ ] Make better distinction between
  - [ ] general configuration flags: type, user, password, ...;
  - [ ] database specific flags: class, classpath, dsn, ...;
  - [ ] and data selection flags table and query
[ ] Expand "Configuring", add documentation on missing configuration flags (not all are listed!)
[ ] Remove "Updating" from website altogether, as the page is no longer relevant since version 0.3.5.
[ ] Clarify "Supported File Formats"
- [ ] Combine .bz2, .zip, .gz variants of extensions into the main extension (more like ".csv: CSV files that use a comma separator (supports compressed variants)", and explain which compressed variants are accepted separately)
- [ ] Link to new page "SQL databases"
Further improvements:
- [ ] Add vignettes
- [ ] Check documentation of existing functions for typo's, unclear/ambiguous sentences.
- [ ] Check documentation for information that should move to a vignette

These are just some possible updates to the current pages that I can think of right now, but perhaps you (as new users?) are missing something altogether. Please feel free to let me know, I can add it to this list. Also, if you think something is utter nonsense to change, then also let me know, I can just as well remove it again.

Hugovdberg commented 6 years ago

I just saw this video about creating good documentation: https://www.youtube.com/watch?v=azf6yzuJt54 We might consider to restructure our documentation that way, because it helps us to create structure within the current website. The technical reference is kept pretty clean, so we might not need to add that on the website, although the contents are not easily browsable from the webbrowser.

rsangole commented 6 years ago

Folks, any progress on this? I recommend we schedule a skype session for us to create a quick plan of who does what, what's needed in the documentation etc.

Hugovdberg commented 6 years ago

I haven't done anything about the documentation recently, perhaps it's even easier if you just pick an item to update and mark it as done on the list. If you think you're making bigger changes that might conflict with other people's efforts then just shout out ahead of time ;-)

KentonWhite commented 6 years ago

We're chipping away at this slowly. Every month or so I get someone who wants to help with documentation and can point them to this list. Any help on this is greatly appreciated!

rsangole commented 6 years ago

I have adopted project-template fully for my R projects. I teach it to my team at work too. (We might fork it and make customizations specific to our application). Perhaps I can put together a vignette or blog-post to show how I use it in a real-world project.

maikol-solis commented 6 years ago

It's a great idea. For example, I'm lost with the cache function. I don't know where to invoke it in my projects.

Hugovdberg commented 6 years ago

@rsangole What kind of changes would you like to make that requires a separate fork from the main project?

rsangole commented 6 years ago

@Hugovdberg Quite a few customizations actually. I'm using this format to develop projects that might go into a more 'production' environment. So along with /src/ for the source files to call, I need additional folder structure for error & log files, intermediate calculation outputs and final outputs, plots and algorithm performance metrics. I'm standardizing these structures within my team. Furthermore, I'd like to replace all the readme markdowns with customized starter Rmd documents, which will have our logo, color scheme css etc.

Hugovdberg commented 6 years ago

@rsangole That sounds like you don't actually need to fork the project, but just need to create a custom template (using the new create.template function) ;-)

rsangole commented 6 years ago

Ah, alright. I've yet to explore that function. I'll look at it over the weekend.

KentonWhite commented 6 years ago

@maikol-solis Thanks for joining us. Actually your questions about the cache function would be great. There should be documentation on caching. Since we are so familiar with the project, it is hard to see what is confusion.

Could you help us by commenting on what is confusing for you and we can update the documentation there. It would be great if you could make a caching documentation issue so we can keep it in one spot.

maikol-solis commented 6 years ago

@KentonWhite Thanks for helping us to understand the software.

In the munge folder I process the data and create some clean data frame depending of my project. Here is my question: Where I should call the cache function in order to avoid that the scripts in the munge folder recreate the data frames when I run the load.project. How should be an example for the 01.A.R file?

#Load data
load(...)

#Preprocesing
MyDataFrame <- Some coding to process Raw data

#Is it correct?
cache(MyDataFrame)

Now, If in the src folder I made some analysis, Could I call the cache function to save results?

If I want to save the analysis results should be saved in the data folder or where?

Thanks for the help.

Hugovdberg commented 6 years ago

@maikol-solis it appears your question wasn't answered yet. Data from the data is loaded automatically based on the file extension (unless you have less common file type), there should be no need to load it manually in the munge scripts. If you have the option cache_loaded_data enabled the files are cached automatically. If you have expensive munge scripts you might want to cache the results manually by calling cache. You might then also want to build in a guard in the munge script to prevent it from running if the result was loaded from the cache.

Usually you call load.project from a file in the src directory, after which you can do the analysis based on the preprocessed data. If you want to store results the graphs directory is created by default in the full template. If you want you could also create a directory output to store other output. (which I personally do in a custom template).

maikol-solis commented 6 years ago

@Hugovdberg Thank you very much for the information. I was confused about how to use the cache function. One more thing: how this function is aware of changes in data? I mean, if I re-run the munge scripts and re-create another clean data, do I have to call again the cache function or is it aware of the change?

Hugovdberg commented 6 years ago

The cache function only writes to cache if the data in memory has changed from the data in the cache. At the moment the variable is always read from the cache, even if the original data file was changed. In that case you need to clear the variable from the cache and reload manually.

KentonWhite / ProjectTemplate

Contributions to ProjectTemplate documentation #211