Week 7 - 10 Assignement

Kaur-Navdeep commented 3 years ago

Hi @brymz, I am writing here to inform you that I am trying to form ggplot using HSD test results. But as I mentioned earlier seed yield data was not normally distributed. I was wondering should I draw a plot using data with or without log transformation. I am attaching two graphs. For your reference script is present in for_loop.R and I am attaching two graphs here. Seed_yield_graph Seed_yield_without_log

Also, I am missing something in HSD.test in for loop. Just wondering, is it possible to schedule a zoom meeting as per your availability. It would be a great help. Also, I was trying to open the week 11 readings website, and it shows as a page not found.

brymz commented 3 years ago

should I draw a plot using data with or without log transformation?

I like to see both. It helps familiarize yourself with the data and justify your choice in analysis.

The graphs you showed looked good.

I agree it is time for us to meet. How about next Wed 10:30 or 11?

I updated the Week 11 reading link on the wiki.

Kaur-Navdeep commented 3 years ago

@brymz Thank you for your reply and time Dr. Brym. I have my class from 10:40 to 11:30 a.m. on Wednesday. If it’s possible for you can we schedule 11:40 a.m. or any other time as per your preference.

brymz commented 3 years ago

Alright. We could do Wed 11:40 or 2:30. ~Z

Kaur-Navdeep commented 3 years ago

Hi @brymz, thank you so much. Should I send a calendar invite at 2:30 p.m. along with a zoom link?

brymz commented 3 years ago

yes, please

Kaur-Navdeep commented 2 years ago

Hi @brymz, with your guidance and readings provided, I am able to complete this course. I have updated the README.md repository with the required information. All the files present in the git hub are included in the project review. I have deleted the extra files. I understand that I still need to make a lot of improvements and hoping to get constructive feedback from your side. Thank you so much!

brymz commented 2 years ago

Thanks, @navdeepkaurufl! I hope it was a positive experience for you and gave you a jump start in your thesis analysis. I'll have feedback to you ahead of the holiday break. I am very pleased with what I see so far.

Kaur-Navdeep commented 2 years ago

Thank you! @brymz, it was a positive learning experience. Looking ahead to the feedback.

brymz commented 2 years ago

Overall comments

Well done, @navdeepkaurufl!

Your code is functional and well organized. It was a pleasure to learn more about your project through this analysis review. One of the things I saw throughout was extraneous and redundant lines of code. I think a lot of this reflects the newness of programming and should be an easy "fix". Do as much as you can "in the background", instead of printing everything to the Console. Bring only information you hope to check or report to the Console. Otherwise, it is stored and viewable in the Environment. Then, consider organizing your code and analysis in chunks related to these outputs. There is also room for improvement with the names of objects. fit1 for example is not descriptive.

It was nice to see your deliberate approach to model fitting. The simpler models (i.e, 1 or 2 parameters) seem to perform well! Avoid too many parameters. I have not used anova() to compare models before. Let's chat about that!

I very much look forward to seeing what happens next with your analysis. I'd be happy to help/review your work along the way.

Grain_N_rate_trial_harvest_data_2021.R

A full run of the file was successful; however, I had to revise the setwd() function on line 10 to match the path on my local drive. I usually keep this step listed as a reminder comment at the top of the document. In future non-class scripts you may include a 1-2 sentence description of the code commented at the top of the document. The script run also generates five figures.

ggpubr sounds way cool. Which functions did you use in your scripts?

Line 16 - 20: read.csv() can handle the data class definitions, though often I handle this as a second step as you did here.

Line 46: nice for loop. Saves a lot of copy/paste across your data.

Line 55: I don't see the ; used. Also, it's a bit repetitive to call the object after you have created it. No need to print it to the console to know it's there. Same redundancy for lines 74 and 79.

Line 60 & 66: harvest %>% group_by(variety, trt) is the combination of the two separate lines of code. In my opinion, it makes sense to keep the data together as much as possible.

Line 72: Your grain yield appears to be in g while the harvest rate is in Kg.

Line 95: As you mention the Loess stat procedure, I think of it as functionally a visualization tool and should be avoided in pub-quality analysis.

Line 149-159: This is a classic example of redundant naming. I would much prefer the objects you define in 149-154 just to appear in our lm(). For example, lm(grain_yield ~ biomass_yield, data = harvest). Names should be unique and descriptive.

Line 147 & 192: Really cool graphics!

Line 191: Science Question - Why did you choose the shapiro.test()?

Line 225: Ah! The dreaded manual annotate layers. I haven't found a better way. Make sure the anova summary output of groups is something that prints to Console.

Line 316: Nice formulation of se. Consider a way that you could revise sqrt(8) so that if your sample size (n) changed you wouldn't have to revise the code.

Grain_N_rate_trial_inseason_data_2021.R

First thing I notice is that each section of this title is the same as the last file, except for inseason. Could make sense across your whole project, but should be simplified or explained.

I've not seen the with() before, but it looks like an interesting solution to HSD.test(). I have passed an aov() model to HSD.test() instead of the factors.

I see the leaf tissue elemental analysis reported here, but not much evidence of 'in season' data. No harm to the course output, but curious to me.

Grain_N_trial_NDVI_NDRE_data_2021.R

Line 16: Error. mydata$plot_no. does not exist. names(mydata)[1] == "ï..plot_no." Those strange symbols probably come from the data file. Delete and replace. The period symbol (.) is a special character and should be avoided in names, except for in demarcating the file extension (e.g., .csv). It is also used in special programming syntax called Regular Expressions.

Line 64: Nice use of tidyr!

Kaur-Navdeep commented 2 years ago

Hi @brymz, thank you so much for providing constructive feedback. My apologies for the late reply.

I have updated the script and files in which I made a few changes as per your suggestions. I really appreciate your help.

Grain_N_rate_trial_harvest_data_2021.R

For ANOVA to compare models, I looked into some online sources. I am sharing the link of same here. https://bookdown.org/ndphillips/YaRrr/comparing-regression-models-with-anova.html

I used library ggpubr to put an equation on regression graphs using the stat_regline_equation command. I agree using ";" is repetitive; I tried to remove it from the code. Thank you for the suggestion.

As per line 72, I believe the grain yield used for data analysis is in kg/ha. For whole data analysis, the grain yield used is in kg/ha. Grain yield per plant is in grams.

I agree loess should be used as a data visualization tool, and I used it for the exact "# data visualization using ggplot week 6 and 7 ####". I believe I was not so clear in the description of my codes.

It was a thoughtful suggestion to avoid redundant names like x1, x2, etc., for variable names. I made changes in my code as per your suggestion "lm(grain_yield ~ biomass_yield, data = harvest)."

I used the Shapiro test to check if the model's residuals are normally distributed. I understand that graphical plots are helpful to check the normal distribution of data, but to be more precise, I preferred using the Shapiro test. It gives a p-value which I considered easy to interpret compared to graphs. I am not sure if I should continue using it?

It was not easy to use annotate layers however I didn't find a better way; I will try to find some better way. I believe I summarized all the ANOVA outputs in the console and will make sure the same for the future.

It's really nice suggestion to change the formulation of standard error with n. I will try to figure out some way to make suggested changes.

Grain_N_rate_trial_inseason_data_2021.R

I agree initially the codes for both harvest and in-season files are the same, except I used in-season as a data name. As per your suggestion, I added a line at the starting of the code that tried to explain the data. I believe it could be done more precisely.

I was trying to figure out how to use the HSD test when we have interaction among factors, then I found this way of using it "with" command. I am sorry I couldn't understand using aov() model to HSD.test() instead of the factors.

These leaf tissues samples for nutrient analysis were collected at 35 days of sowing, so I considered using them as in-season data. I agree I should have analyzed my plant height and germination data too. I am trying to figure out which would be a better way to analyze them. I will try using repeated measures for plant height. But I still need to figure out how that's done. Any suggestions related to how I can proceed with this data analysis would be helpful.

Grain_N_trial_NDVI_NDRE_data_2021.R

I should not have used plot_no., thank you for bringing that up. I made the required changes. I hope now it will work.

I will keep working on data analysis and will reach out to you for further help and suggestions. Thank you!

Kaur-Navdeep / Agronomic-problems