Yiwen-Zhang-259 / ETC5543-cran-analysis

ETC5543 CRAN Analysis
https://yiwen-zhang-259.github.io/ETC5543-cran-analysis/
0 stars 2 forks source link

How to get the pkg update + initial release date #12

Open emitanaka opened 3 years ago

emitanaka commented 3 years ago
pkg_url <- "https://cran.r-project.org/web/packages/{pkg}/index.html"
pkg_archive <- "https://cran.r-project.org/src/contrib/Archive/{pkg}/"
pkg_updates <- map(your_pkgs_vector, function(pkg) {
    last_update <- read_html(glue(pkg_url)) %>% 
      html_table() %>% 
      .[[1]] %>% 
      filter(X1=="Published:") %>% 
      pull(X2) %>% 
      ymd()

    archive_dates <- tryCatch({ 
        read_html(glue(pkg_archive)) %>% 
          html_table() %>%
          .[[1]] %>% 
          pull(`Last modified`) %>% 
          ymd_hm() %>% 
          na.omit() %>% 
          as.Date()
      }, error = function(e) {
        NULL
      })
    c(archive_dates, last_update)
  })
names(pkg_updates) <- your_pkgs_vector

updates <- unlist(pkg_updates) %>% 
  enframe("package", "update") %>% 
  # unlist converts date to integers
  mutate(update = as.Date(update, origin = "1970-01-01"),
         # need to get rid of the numbers appended to pkg names
         package = str_extract(package, paste0(your_pkgs_vector, collapse="|"))) 
emitanaka commented 3 years ago

You need to substitute your_pkgs_vector with the packages of interest as a character vector

emitanaka commented 3 years ago
library(lubridate)
library(dplyr)
library(rvest)
library(glue)
Yiwen-Zhang-259 commented 3 years ago
pkg_url <- "https://cran.r-project.org/web/packages/{pkg}/index.html"
pkg_archive <- "https://cran.r-project.org/src/contrib/Archive/{pkg}/"
pkg_updates <- map(your_pkgs_vector, function(pkg) {
    last_update <- read_html(glue(pkg_url)) %>% 
      html_table() %>% 
      .[[1]] %>% 
      filter(X1=="Published:") %>% 
      pull(X2) %>% 
      ymd()

    archive_dates <- tryCatch({ 
        read_html(glue(pkg_archive)) %>% 
          html_table() %>%
          .[[1]] %>% 
          pull(`Last modified`) %>% 
          ymd_hm() %>% 
          na.omit() %>% 
          as.Date()
      }, error = function(e) {
        NULL
      })
    c(archive_dates, last_update)
  })
names(pkg_updates) <- your_pkgs_vector

updates <- unlist(pkg_updates) %>% 
  enframe("package", "update") %>% 
  # unlist converts date to integers
  mutate(update = as.Date(update, origin = "1970-01-01"),
         # need to get rid of the numbers appended to pkg names
         package = str_extract(package, paste0(your_pkgs_vector, collapse="|"))) 

Hi Emi, some packages from a certain topic are removed from CRAN now, do I need to exclude them in my analysis? For these packages, the latest release date can't be extracted. : )