Milestone 1 feedback - Githubissues

eric-f commented 5 years ago

Mechanics

[x] Might want to add version number for the packages in the dependency section.

Quality

[x] Could have touched up the documentation a little more.
[x] E.g. instead of saying 'this script takes 2 arguments, input and output, you might want to state clearly which argument is which.
[ ] E.g. Add another usage example: Rscript src/analyze_result_choc_data.R input_csv_file output_png_file (Not sure if this is the best practice. Just throwing out some ideas based on what I see from other's project.)

Report

[x] Would it be easy to reaplce the dots in the column names in Table 1 by whitespaces?
[x] In Figure 1, some origins (South America, Central and S. America, Carribean, West africa, etc) are not countries. Might want to edit the figure title for a more accurate description?
[x] South America includes Venezuela (and happens to have the highest average rating). Might want to address this issue in the Critique section as well. (I wonder if that means some beans are from multiple counties in S. America.)
[x] Would be nice to replace the 'snake_case' in Table 2 by the usual case.
[ ] Great report.

Extra

[ ] The project overview in README is a good pitch.
[ ] Nice flow chart. What apps did you use?

rachelkriggs commented 5 years ago

@eric-f Thank you for the feedback. I have a question regarding naming: You've asked if it is easy to replace the dots in the column names with whitespaces. Yes, it's easy to do this. However, it makes working with the data in the rest of the scripts more difficult and problematic. I was under the impression that column names should not contain spaces and should either use thisCase or this_case or this.case. Is there a preference between these 3 options? I looked at Hadley Wickham's style guide (http://r-pkgs.had.co.nz/style.html) for an answer but didn't find one. Google's R Style Guide (https://google.github.io/styleguide/Rguide.xml) suggests this.case but this is essentially how our data is already structured and presented.

Similarly, you've asked to replace the 'snake_case' in Table 2 by the usual case -- what is usual case? Is this words with spaces in them? Again, I feel that column names that contain spaces are difficult to work with and should be avoided and would ask which of the 3 options above would be best.

Thanks!

@carrieklc

eric-f commented 5 years ago

@rachelkriggs

I was under the impression that column names should not contain spaces and should either use thisCase or this_case or this.case.

Agree.

Is there a preference between these 3 options?

Not sure about snake_case vs camelCase, I have seen both.

I think this.case is often reserved for object oriented programming. For example, sometime the dot is an operator for accessing elements in object-oriented programming, (http://reeborg.ca/docs/oop_py_en/oop.html).

R does something like that too. Say, summary() is a method for various class of R objects and a different function is invoked depending on the type of the input. You are actually invoking summary.lm() if you apply summary on a output from lm, but summary.data.frame() if you apply it to a data.frame. (See https://adv-r.hadley.nz/oo.html if you are interested)

Similarly, you've asked to replace the 'snake_case' in Table 2 by the usual case -- what is usual case? Is this words with spaces in them?

Yes just words with spaces. Not sure if there is a name...

Again, I feel that column names that contain spaces are difficult to work with and should be avoided and would ask which of the 3 options above would be best.

Back to my suggestion on column headers. Sorry if my initial comment wasn't clear. I was not suggesting to change the R variable names but to print the column headers in 'words with spaces' style when the kables are rendered.

I was thinking two approaches to do this, either 1) via some kind of string operation like stringr::str_replace()on the kable object or 2) renaming the variables temporarily just before parsing to kable(). Not sure if this is possible or easy, hence the question.

rachelkriggs commented 5 years ago

@eric-f Thanks very much for clarifying this.

rachelkriggs commented 5 years ago

@eric-f One more question, regarding the last point under 'Extra': we're only using R. Are you suggesting adding this information to the flowchart or something else?

eric-f commented 5 years ago

@rachelkriggs I was just wondering how you created the flowchart and if you have found a way to create them easily. I am not aware of any package to do that in R, though I haven't really tried finding them. If I had to do draw this, I will probably do it in MS Word... So, just wonder if you have found something nice :)

rachelkriggs commented 5 years ago

@eric-f We used draw.io (which we were introduced to from 513 our Databases course). It's a pretty nice tool, very easy to use. But we haven't yet figure out how to save the working file (only the output).

rachelkriggs commented 5 years ago

@carrieklc Regarding Eric's point about South America in the report: my feeling is that we should exclude 'South America' as it's too broad a region. Let me know what you think.

carrieklc commented 5 years ago

@rachelkriggs I also noticed there is a "Central and S. America" value - I think these should be removed from the analysis too (not just the exploratory plot), if that is what you are suggesting? E.g. Remove rows where broad bean origin has "America" in it?

rachelkriggs commented 5 years ago

@carrieklc That sounds like a good idea to me.

carrieklc commented 5 years ago

@eric-f Regarding the third item under Report, we've made the fixes in commit 697d0d7ea204de13e235d39ed3af8809c0456d4d to exclude very broad regions in our visualization and analysis (e.g. "South America"). Since in this case, we cannot identify which country the beans came from - that is, they could be from Venezuela or not, or even some combination.

We've addressed all actionable items above and so are closing this issue. Thanks!

UBC-MDS / DSCI_522-Chocolate_Ratings_Analysis

Milestone 1 feedback #27