Sokiwi / Tone-WordLength

Script for preparing data and replicating analyses and replicating analyses for the paper Tones, word length, and population size across languages
Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

Problems in Executing R-Script #1

Open LinguList opened 1 year ago

LinguList commented 1 year ago

I cannot replicate the Rscript.

I use the Rscript function on a linux machine. First, your script has errors in the names for the files you describe, you should correct them as follows (in the first lines):


### DATA PREPARATION
# download Phoible from zenodo, doi: 10.5281/zenodo.2677911
# unzip and put the three files contributions.csv, languages.csv
# and values.csv in the current folder
# change their names to phoible_contributions.csv, phoible_languages.csv
# and phoible_values.csv

# the following operations unites the essential data from Phoible in a 
# data frame called pho
contributions <- read.csv(file="phoible_contributions.csv")  # basic data on tones
languages <- read.csv(file="phoible_languages.csv")  # metadata
values <- read.csv(file="phoible_values.csv")  # metadata

Then, I run into an error here:


for (i in 1:nrow(pho2)) {
    if ( is.na(pho2$count_phonemes[i]) ) {
        w_iso <- which(wld$iso==pho2$iso_code[i])
        if ( length(w_iso) > 0 ) {
            pho2$lg_name[i] <- wld$names[w_iso]
            pho2$glot_fam[i] <- wld$glot_fam[w_iso]
            pho2$pop[i] <- wld$pop[w_iso]
            pho2$forty_mean[i] <- wld$forty_mean[w_iso]
            pho2$continent[i] <- wld$continent[w_iso]
        }
    }
}

The error says:


Fehler in `$<-.data.frame`(`*tmp*`, "lg_name", value = c(NA, NA, NA, NA,  : 
  Ersetzung hat 11 Zeilen, Daten haben 1651
Ruft auf: $<- -> $<-.data.frame

I do not understand the error, but I figure there is something wrong in the way you suggest to download and rename the files. Why not just use git to download the repositories and then use relative paths to the respective data points? Renaming is not the best practice here.

Then, you do not provide the Zenodo link for WALS, and you give a DOI for phoible, but a resource link for ASJP, use DOI in both cases.

All in all, you could even write a Makefile that uses GIT to download the respositories. All would prevent these errors.

Can you please tell me what to do with the error and correct the script accordingly? I am reviewing this study on Frontiers and would then proceed from there.

bambooforest commented 1 year ago

I can confirm the not so user friendly part of missing URLs to the Zenodo'd files, but the script runs without issue on RStudio for me when the libraries are installed. Here's my session info:

sessionInfo() R version 4.2.2 (2022-10-31) Platform: aarch64-apple-darwin21.6.0 (64-bit) Running under: macOS Ventura 13.0.1

Matrix products: default LAPACK: /opt/homebrew/Cellar/r/4.2.2/lib/R/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] colorspace_2.0-3 rworldmap_1.3-6 sp_1.5-0 lme4_1.1-30 Matrix_1.5-1

loaded via a namespace (and not attached): [1] Rcpp_1.0.9 compiler_4.2.2 pillar_1.8.1 nloptr_2.0.3 viridis_0.6.2 tools_4.2.2 dotCall64_1.0-2
[8] boot_1.3-28 viridisLite_0.4.1 lifecycle_1.0.3 tibble_3.1.8 gtable_0.3.1 nlme_3.1-160 lattice_0.20-45
[15] pkgconfig_2.0.3 rlang_1.0.6 DBI_1.1.3 cli_3.4.1 rstudioapi_0.14 spam_2.9-1 gridExtra_2.3
[22] dplyr_1.0.10 generics_0.1.3 vctrs_0.4.2 fields_14.1 maps_3.4.0 grid_4.2.2 tidyselect_1.1.2 [29] glue_1.6.2 R6_2.5.1 fansi_1.0.3 foreign_0.8-83 minqa_1.2.4 ggplot2_3.3.6 purrr_0.3.5
[36] magrittr_2.0.3 maptools_1.1-4 scales_1.2.1 MASS_7.3-58.1 splines_4.2.2 assertthat_0.2.1 utf8_1.2.2
[43] munsell_0.5.0

LinguList commented 1 year ago

Then maybe it is because the packages are not installed. In this case, I suggest to use something more consistent, like groundhog in R to include packages from a particular date, and to provide a list of packages I need to install (I was doing the dummy-installation). Note also, that in R-Studio this probably cannot work, if you follow the file-name swaps with language_phoible.csv vs. phoible_language.csv

bambooforest commented 1 year ago

One can install packages by version:

https://search.r-project.org/CRAN/refmans/remotes/html/install_version.html

Personally I've never had a problem with R package versions, which is common in Python.

And yes the file renaming is a bit annoying and unnecessary.

LinguList commented 1 year ago

Did not know that, but anyway: one needs to provide (1) the list of packages that need to be installed, and (2) the versions then

bambooforest commented 1 year ago

In R world it's implicit that the user install the packages that get called by the library() function. But yes I agree it would be better to write code that explicitly checks and does this in the code for the user.

LinguList commented 1 year ago

Or just add a little readme, telling me, as a new user, to install these packages, download those datasets, and then run the code. Following the zen of Python here is crucial for replicable science: explicit is better than implicit.

Sokiwi commented 1 year ago

Hi Mattis,

now I've attended to all the things we talked about. I made the changes directly to https://github.com/Sokiwi/Tone-WordLength/blob/main/tones.R. You should clean away everything you have and start afresh, downloading the revised script. When you go through that you will see changes made such that

I can't (or rather don't want to) control where you download and place files, but I have given instructions about that. The downloaded files should be put in your working directory. This is really the only place where the user can screw up. But it is also the kind of action that a user wants to have control over. I gave some tips about how to manipulate the working directory in the comments.

I hope everything works smoothly now.

Soeren.

On Wed, Jan 11, 2023 at 9:52 AM Johann-Mattis List @.***> wrote:

Or just add a little readme, telling me, as a new user, to install these packages, download those datasets, and then run the code. Following the zen of Python here is crucial for replicable science: explicit is better than implicit.

— Reply to this email directly, view it on GitHub https://github.com/Sokiwi/Tone-WordLength/issues/1#issuecomment-1378419476, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFRPW6ZSS5GEMRATCAFJL4LWRZYFVANCNFSM6AAAAAATV2QJSY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

LinguList commented 1 year ago

Okay, I'd consider this solved then. @bambooforest, I do not know how to react on the review in Frontiers, but I guess you can do that. I'd probably have to step back from reviewing, now that this script is provided in a somewhat more friendly way.

A last thing: why not place a Makefile into your repository, that will do the download for you? Just make a file, call it Makefile, and then insert in it the following:

download:
    curl --output wals-v2020.3.zip https://zenodo.org/record/7385533/files/cldf-datasets/wals-v2020.3.zip?download=1
    curl --output "Data-01 ASJP data raw.txt" https://zenodo.org/record/6344024/files/Data-01%20ASJP%20data%20raw.txt?download=1
    curl --output phoible-v2.0.1.zip https://zenodo.org/record/2677911/files/cldf-datasets/phoible-v2.0.1.zip?download=1
bambooforest commented 1 year ago

@Sokiwi -- everything works fine for me now. I think you can close this issue.

@LinguList -- as far as I understand, I need to wait until the other reviewer submits their report and then I can open the interactive review forum. Then @Sokiwi can respond to your review. R2 has until the end of the month to submit.

LinguList commented 1 year ago

Fine with me. After this, I am out of reviewing, though, as I cannot judge if the methods applied here are useful. Given that I could help at least a bit to show that some thorougher checking of the code would be useful, I think I did all I can do.

LinguList commented 1 year ago

But you could add one thing for the download: a makefile:

download:
    curl --output wals-v2020.3.zip https://zenodo.org/record/7385533/files/cldf-datasets/wals-v2020.3.zip?download=1
    curl --output "Data-01 ASJP data raw.txt" https://zenodo.org/record/6344024/files/Data-01%20ASJP%20data%20raw.txt?download=1
    curl --output phoible-v2.0.1.zip https://zenodo.org/record/2677911/files/cldf-datasets/phoible-v2.0.1.zip?download=1

This gives you all data with one command:

make
bambooforest commented 1 year ago

@LinguList -- I don't understand your comment above. Are you removing yourself as a reviewer of the paper? The code was always well documented about what analyses are being done:

https://github.com/Sokiwi/Tone-WordLength/blob/507594c0a654c3f92babf02481a17cd8f0df6a48/tones.R#L234

LinguList commented 1 year ago

I wrote my review, the author answered it, and I myself cannot provide any more input, as I am not an expert in mixed models and the like. So yes, my job is done. I made the world of documented code that uses CLDF data a bit more consistent, albeit not as consistent as I would've hoped for, bu I cannot do more at this stage.

bambooforest commented 1 year ago

OK then, thanks for your review of the CLDF use.