EmilHvitfeldt / textdata

Download, parse, store, and load text datasets instead of storing it in packages
https://emilhvitfeldt.github.io/textdata/
Other
75 stars 13 forks source link

Allow non-interactive use of load_dataset() #19

Closed richierocks closed 5 years ago

richierocks commented 5 years ago

Currently load_dataset() calls printer(), which in turn calls menu(), which throws an error if R is run non-interactively.

See

An example of wanting to use this function non-interactively is including the datasets in a Docker image.

I think that in a non-interactive session you can always assume that the user wants to download the dataset, so a possible reworking of printer() might be something like the following.

printer <- function(name) {
  info_name <- print_info[[name]]
  if(interactive()) {
    cat("Do you want to download:\n",
      "Name:", info_name[["name"]], "\n",
      "URL:", info_name[["url"]], "\n",
      "License:", info_name[["license"]], "\n",
      "Size:", info_name[["size"]], "\n",
      "Download mechanism:", info_name[["download_mech"]], "\n"
    )
    menu(choices = c("Yes", "No"))
  } else {
    cat("Downloading:\n",
      "Name:", info_name[["name"]], "\n",
      "URL:", info_name[["url"]], "\n",
      "License:", info_name[["license"]], "\n",
      "Size:", info_name[["size"]], "\n",
      "Download mechanism:", info_name[["download_mech"]], "\n"
    )
    1
  }
}

That is, the message is changed from "Do you want to download" to "Downloading" and menu() is replaced by always returning 1.

juliasilge commented 5 years ago

This will require some careful thinking through, because the lexicon creators who agreed to have their work included in this way agreed because we set up for a user to agree to the license when they download, i.e. no commercial use, etc. That alternative does not sound like it is within the parameters we set up with the lexicon creators.

EmilHvitfeldt commented 5 years ago

Hello @richierocks,

textdata was build to ensure the user would be forced to agree to the terms of the datasets. Allowing the functions to download the data non-interactively wouldn't work.

however it is worth noting that interactivity is only needed the first time the data is being accessed. So you could call the functions once to agree to the conditions of the data, and then have subsequent analysis be done non-interactively.

Please note that some of the datasets (especially the lexicons) comes with "do not redistribute", meaning that you wouldn't be able to put the data in a docker image for someone else to use as they haven't agreed to the conditions of use.

richierocks commented 5 years ago

In that case, I have 2 questions:

Would you be happy with adding an accept_license argument to load_dataset(), like

load_dataset <- function(data_name, name, dir, delete, return_path, accept_license = printer("name")) {
  if(!accept_license) {
    stop("You need to accept the license before you can use this dataset.")
  }
  # rest of function as before
}

That way the user has to explicitly say they are accepting the license by passing accept_license = TRUE.


Failing that, is it possible to make lexicon_nrc() work with the commercial version of the lexicon? That is, if I buy a copy, how do I make lexicon_nrc() (and therefore tidytext::get_sentiments()) make use of it?

EmilHvitfeldt commented 5 years ago

I'm still not comfortable with adding an option to allow downloads without the prompt.

If you buy a copy of the data, then you can place it in textdata's search path, and then the preprosessing and delivery will happen without the prompt. You can do this one of 2 ways.

  1. Place the data in the default path. This will depend on your operating system but can be found by running textdata::lexicon_afinn(return_path = T).

  2. Placing the data in a folder of your choosing and directing textdata to use that directory by specifying the dir= argument. textdata::lexicon_afinn(dir = "my-data-folder")

umasenthil commented 5 years ago

I completely new to R. I downloaded afinn dataset and tried to change the directory of the afinn data. It still prompts the user to download afinn. Is this expected? Thanks!

` textdata::lexicon_afinn(dir = "/Users//Downloads/AFINN") Do you want to download: Name: AFINN-111 URL: http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010 License: Open Database License (ODbL) v1.0 Size: 78 KB (cleaned 59 KB) Download mechanism: https

1: Yes 2: No `

EmilHvitfeldt commented 5 years ago

Hello @umasenthil Yes this is correct behavior. Changing the dir argument doesn't move the dataset, but rather tells the function where to look.

umasenthil commented 5 years ago

@EmilHvitfeldt Thank you for the clarification!

umasenthil commented 5 years ago

@richierocks I tried your solution of overriding the printer() method. The afinn installation is still in an interactive mode: Rscript --vanilla install_afinn2.R Do you want to download: Name: AFINN-111 URL: http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010 License: Open Database License (ODbL) v1.0 Size: 78 KB (cleaned 59 KB) Download mechanism: https Error in menu(choices = c("Yes", "No"), title = title) : menu() cannot be used non-interactively Calls: get_sentiments -> <Anonymous> -> load_dataset -> printer -> menu Execution halted

I am trying to automate Rscripts. Is there a way to make the 'afinn' dataset installation non-interactive? Or Is there a way to pass the user input as a parameter to Rscript? Thank you