ecn310 / course-project-development

How has the link between manufacturing employment and economic growth evolved over time and across countries.
0 stars 0 forks source link

Reproducibility plan #21

Open kbuzard opened 3 months ago

kbuzard commented 3 months ago

@ecn310/development I'm creating this issue to have a place to give you feedback on your replication package.

I'm looking at the file Replication Package Plan.md.

I have two types of comments. First, on this file, and second, on the structure of your reproducibility package.

On the file:

  1. In Step 1, you say "click on the "Resources" section."
    • First, I think this would be more clearly described by "Click on Resources in the top menu"--I was looking all over for Resources below in the main body of the page)
    • Second, when I click on "Resources", a menu drops down with four choices. I don't know which of these to choose, so I get stuck and can't get any further.
  2. "Steps Taken to Produce Development Results Accessing" is not clear. You need a clear header here to indicate that these are the steps for downloading the data.
  3. There should also be some context at the beginning of the file: just a short introduction that gives the report name (both the full title and the name of the file) and says that these are the steps to take to reproduce the analysis in that file.
  4. Under the data analysis section, you mention two do files, but I can't find them. You should make the file names into hyperlinks to the file location. BUT FIRST, read below on the structure of the reproducibility package.
  5. You need to add information at the end of the first section about where to save the data files on your repo.
  6. You need to add information at the end of the second section about where the outputs are located.

Structure of your reproducibility package

  1. You need a folder called "reproducibility package"
  2. Then, move Replication Package Plan.md into that folder and rename it to README.md.
    • This will make your instructions the README at the bottom of the folder's main page.
  3. Then, if you only have two do files, put those do files in this folder.
  4. Move your Data folder and Variable Graphs folders into the reproducibility package folder. Now, everything you should need for someone to reproduce the project is all in one place.
  5. Finally, you'll need to organize all the files that are in the main folder. There probably shouldn't be any files other than README and your report.
    • Delete any files that you don't need for reproducibility or for some other reason (maybe you have some code that you might want to come back to, or some graphs that didn't make it into the final report but that you'd like to keep).
    • Organize the files you want to keep into folders--either into the reproducibility package folder or other folders like "drafts" or "working_files".

Remember that when you move the folders, you'll need to adjust the file paths in your do-files so that they match the new location.

Meiska12 commented 2 months ago

12/17/24 - 12/20/24

@kbuzard, please look at the replication package again; thank you. I have worked on your feedback and Ryans. So, the mentioned errors would not be present. Done - worked on all the feedback and completed the changes today. Pending - organization of the rest of the files

rpseely commented 2 months ago

@ecn310/development I did not see this issue! Please refer to my feedback on the reproducibility package here

kbuzard commented 2 months ago

@kbuzard, please look at the replication package again; thank you. I have worked on your feedback and Ryans. So, the mentioned errors would not be present.

I think it would be most effective for @rpseely and me to do this in sequence (so you get feedback in between that you can implement before the other person looks at it).

rpseely commented 2 months ago
  • Since you had some questions for @rpseely, it's probably better if he goes first (assuming he has time at all). @rpseely Let me know what your plan is with regard to this and I can fit my feedback in around that (again, assuming you have time to do this at all).

@kbuzard @ecn310/310-students At first glance, it looks like the development team implemented my feedback from this post I made Wednesday. I will take another look at it tonight and see if I have any more feedback.

rpseely commented 2 months ago

@ecn310/development I took a second look at your reproducibility package. Here are my thoughts:

  1. You still have these lines of code in the gdp_developed_analysis.do file that produce graphs that are, as far as I can tell, of no value. I would either replace them with code for other useful graphics or delete the lines altogether.
twoway (line gdp_Developed year), ///

graph bar gdp_Developed, over(year, sort(ascending)) ///
  1. Again, please change "drop if year == 2020" in gdp_developed_analysis.do to match what you did in Merge_GDp_Mftc.do. You don't need the line because you save the data without observations from 2020 onward in Merge_GDp_Mftc.do to do the analysis, but if you are going to keep it in it should match.
  2. On that note, in your Overleaf for your final report I see other graphics that are not made in any code from your reproducibility package. If you intend to keep these graphics in your final report, you must provide the code used to make them in the reproducibility package.
  3. Something new that I noticed: when downloading the data from Our World in Data, I believe it is the default that the downloaded data will have short column names. I would still add a sentence that says that when downloading the data, one should use short column names. Screenshot 2024-12-21 at 3 48 13 AM
Meiska12 commented 2 months ago

@ecn310/development I took a second look at your reproducibility package. Here are my thoughts:

  1. You still have these lines of code in the gdp_developed_analysis.do file that produce graphs that are, as far as I can tell, of no value. I would either replace them with code for other helpful graphics or delete the lines altogether.
twoway (line gdp_Developed year), ///

graph bar gdp_Developed, over(year, sort(ascending)) ///
  1. Again, please change "drop if year == 2020" in gdp_developed_analysis.do to match what you did in Merge_GDp_Mftc.do. You don't need the line because you save the data without observations from 2020 onward in Merge_GDp_Mftc.do to do the analysis, but if you are going to keep it in, it should match.
  2. On that note, in your Overleaf for your final report I see other graphics that are not made in any code from your reproducibility package. If you intend to keep these graphics in your final report, you must provide the code used to make them in the reproducibility package.
  3. Something new that I noticed: when downloading the data from Our World in Data, I believe it is the default that the downloaded data will have short column names. I would still add a sentence that says that when downloading the data, one should use short column names.
Screenshot 2024-12-21 at 3 48 13 AM

@rpseely, I worked on all the comments except for 3 - I am working with my group to figure out where the other code is and whether or not we need everything.

@kbuzard, will you be able to look at the reproducibility package?

kbuzard commented 2 months ago

@rpseely, I worked on all the comments except for 3 - I am working with my group to figure out where the other code is and whether or not we need everything.

@kbuzard, will you be able to look at the reproducibility package?

@Meiska12 I'll be happy to look it over and give you my feedback. Just let me know when you've resolved this outstanding issue.

Meiska12 commented 2 months ago

@rpseely, I worked on all the comments except for 3 - I am working with my group to figure out where the other code is and whether or not we need everything. @kbuzard, will you be able to look at the reproducibility package?

@Meiska12 I'll be happy to look it over and give you my feedback. Just let me know when you've resolved this outstanding issue.

Hello @kbuzard, that has been resolved. There is only code for the figures present in the overleaf when you take time to look at the Reproducibility package and the Overleaf. Thank you.

kbuzard commented 2 months ago

@ecn310/development Here is my feedback on the reproducibility package:

  1. All of the files needed in the reproducibility package should be in the reproducibility package folder (e.g., Data)
    • All the files and folder that are not needed for the reproducibility package should be hidden away in a folder called something like WorkingFiles.
    • The repo structure when you're done should just have a few folders (reproducibility package, working, .github).
      • Then, there should only be a handful of files in the main repo: your .tex file for the report, your PDF of the report (it should have the same file name as the .tex file--that is, the names should match except for the .tex and .pdf ending),README.md,
    • IMPORTANT: When you move files around, any links in your package or paper and paths in your programs have to be adjusted so that they work.
      • I'm sure you know that the links on your main README aren't there yet.
  2. Your README in the reproducibility package had an @ sign at the end of the name so it didn't show up as the readme for the folder. I fixed that because it would make it much harder for me to evaluate the package as it was.
  3. I can't follow the first paragraph of the Readme (Step-by-step on how to produce and attain the data before the analysis. To conduct the study, you can look at the Merge_GDP_Mftc.do file to).
  4. In Part 1, "Look up the first variable, "Manufacturing jobs as a share of total employment," Bank World Development Indicators Data Bank page." is unclear. Look up how? Am I supposed to search that term in the search board? Or am I supposed to find the bank world (world bank?) page by looking around?
    • You say "Once the variable has been selected, choose the chart view," but what does this mean? Click the button toward the top left of the page that says "Chart"?
    • "deselect all the pre-opted categories"--where? I just see the list of countries and continents
  5. "It would be helpful to keep all the CVS links in one place on GitHub (where we are working) so that they are easy to access when analyzing the data. We have saved ours in the Issue called https://github.com/ecn310/course-project-development/issues/15#issuecomment-2486305999"
    • is CVS (a pharmacy chain) supposed to be csv (or comma separated files)
    • "and copy the "Data URL (CSV format)."" What is the purpose of copying this? Am I supposed to use it somewhere?
    • You should give explicit instructions to download the datasets to the folder you use on Github, and tell the exact names to use
    • When I followed the rest of the instructions through to "Once you have chosen these options, go to "Quick Download" and select "Download displayed data."", it lets me download a zipped file, not a csv. You either need to change the instructions so it downloads a csv, or tell the user how to unzip the file, and then delete everything from the repo that's not needed (the zipped version and anything else that comes along with it).
    • You also need to find a clear way to tell the user that they have to go back through this process three more times, and exactly what they need to do.
    • I'm now going to skip to your code since I don't know how to proceed here.
  6. Under Data Analysis
    • "Once all the links have been attained, in this case, only two are required for each variable. There should be four links in total."--I think you mean four csv files, not links.
    • HOLD THE PHONE! Now that I'm into your do-file, I see that you have copied in links instead of downloading csv's. This is fine, but that doesn't come through clearly above. Someone could just open up the do-file and they're getting the data directly.
    • SO: What I strongly suggest is one of two things:
      1. to put a statement about how your do-file accesses the data through weblinks, and then put all the instructions for how to reproduce those links at the end and just refer to them/link to them in what will be pretty brief instructions for running the data manipulation and analysis files.
      2. keep them in the same order, but make a very clear, strong statement at the beginning about this issue with the links.
  7. Since your repo needs to be rearranged, it doesn't make sense for me to test your code until the rearrangement has occurred. What I have suggested will make the file path issue much cleaner: you can cd to the replication package at the top of the file, then just cd into the data directory (or whatever)--I think I gave Sergio instructions for how to do this yesterday. If not, and you need help, just let us know.

I will try to look at your report later today (I didn't block out time because you only asked me for time to look at the reproducibility package), but I have commitments for the next several hours and don't know when that will be or how much time I'll have once I get free. I'll be in contact once I know.

Meiska12 commented 2 months ago

Hello Prof @kbuzard, I am unsure why this is occurring, but I have reached the quota for Overleaf. I cannot write more than a few sentences. You can see anything written beyond black ink and green ink in the screenshot. Please let me know what I have to do.

kbuzard commented 2 months ago

@Meiska12 This is not a quota. You put in a percent sign, which Overleaf sees as a signal to create a comment. If you put \%, this will show up as a percent sign and it will not turn the rest of your text into a comment (that's what the green means).

Meiska12 commented 2 months ago

@Meiska12 This is not a quota. You put in a percent sign, which Overleaf sees as a signal to create a comment. If you put \%, this will show up as a percent sign, and it will not turn the rest of your text into a comment (that's what the green means). I got it. I fixed it, but I am trying to write the paragraph below the scatterplot, which was the issue. As in the screenshot, it is positioned after Fig 2 and only shows the last two sentences. Not sure how to get around this.

Screenshot 2024-12-22 at 12 33 20 PM
kbuzard commented 2 months ago

@Meiska12 I can't understand what the problem is. I can't see any text under figure 2 in either the code or in the compiled PDF in the screenshot. When I look at the compiled version directly on Overleaf, the text you have in between the two figures in the code shows up part before the figures and part after the figures, but everything you have typed in there shows up. If you don't like the placement of the figures relative to the text, there are options you can add to the \begin{figure} statement to get LaTeX to put them in different locations (this is a place where ChatGPT is very helpful).

kbuzard commented 2 months ago

I will try to look at your report later today (I didn't block out time because you only asked me for time to look at the reproducibility package), but I have commitments for the next several hours and don't know when that will be or how much time I'll have once I get free. I'll be in contact once I know.

@ecn310/development I didn't have time to read through everything, but I looked at the abstract, data, started the results (figures are missing, so I got stuck), the bibliography and data appendix. Let me know if it would be helpful for me to look at something more in the morning.