Dr-Eberle-Zentrum / Data-projects-with-R-and-GitHub

https://dr-eberle-zentrum.github.io/Data-projects-with-R-and-GitHub/
6 stars 5 forks source link

Daniela's Project #120

Closed neopolyglot closed 11 months ago

neopolyglot commented 1 year ago

The data is really challenging. I have done the following:

  1. Attempted to tidy up the .txt files by converting them to .csv. I used a for loop to loop all the 5 .txt files and appended them to the same .csv file.
  2. The column names were still not coming up the way I wanted to I manually edited the CSV file to remove the excess top row.
  3. It was not possible to subset the data by removing "Nestle" and Palm Oil based products right now. May be in a future iteration I would be able to do that.
  4. Since I manually edited the output.csv file, I couldn't continue in the same program. So I wrote another program and included the output.csv as an input in that. The code then counted the instances of Nutriscore A to E and plotted a bar chart with colors ranging from Green for A and Red for E.
  5. I then added this script to the .Rmd file and generated the .md document.

So basically the data is too complicated and untidy to handle. That's an issue. Also I didn't find the -15 to 40 value range for Nutriscore that Daniela had mentioned in the original project description. Perhaps a little more elaboration and clarity on this matter is required.

DKemp98 commented 1 year ago

@neopolyglot, sorry for the hassle! That wasn't intended. I myself had no problems to read in the data, as there are also functions to read txt files into tables. Maybe your work can be used by Alexander as a starting point? I personally found enough information in wikipedia to understand the value range of the nutriscore, which I also linked. I also included the ranges in the 'helpful stuff' section.

neopolyglot commented 1 year ago

Yeah. But I still found it very difficult. Anyway, my bigger problem is the files I committed are not visible in my branch. I put that in your folder and then committed as instructed.

muellertabea commented 1 year ago

Hey @DKemp98 , @martin-raden I was sick the last week, so that's why I'm pretty late with the task... sorry for that! To be honest I'm struggling a lot with the data I could import the first data, but unfortunately the other datasets don't have column titles and adding them together gives me always somewhere error warnings. A lot of the data is blank and the nutriscore is mostly empty so that's why I will work with the ecoscore - or do you have another solution?

Thanks for your reply

DKemp98 commented 1 year ago

@muellertabea - you can probably get around the warnings, by adjusting the columnames of the other datasets. colnames(food_data_1)<-colnames(food_data_0) before merging. The ecoscore is an entirely different system, but as there is so much data, just using the rows where nutriscore is not blank will be enough. Please also note (if you haven't already) to look at the updated project description .md file, as there are some numbers that could be useful.