cranhaven / cranhaven.r-universe.dev

WARNING: This is a proof-of-concept idea - it might be removed again
https://cranhaven.r-universe.dev
MIT License
5 stars 0 forks source link

Official way to get archived packages #8

Open jeroen opened 2 months ago

jeroen commented 2 months ago

CRAN stores archived packages with date in PACKAGES.in

cran_archived_db <- function(){
  con <- url("https://cran.r-project.org/src/contrib/PACKAGES.in")
  on.exit(close(con))
  db <- as.data.frame(read.dcf(con))
  comments <- db[['X-CRAN-Comment']]
  pattern <- "(Removed|Archived) on ([0-9-]+)"
  m <- regexec(pattern, comments)
  db$Date <- as.Date(vapply(regmatches(comments, m), function(x){
    x[3]
  }, character(1)))
  db <- db[!is.na(db$Package)  & !is.na(db$Date) & db$Date >= '2022-01-01',]
  db$Reason <- gsub("\\s", " ", trimws(sub(" as|for", "", sub(pattern, '', db[['X-CRAN-Comment']]))))
  db[order(db$Package),c("Package", "Date", "Reason")]
}
HenrikBengtsson commented 2 months ago

Thanks. Yes, we should probably track the source directly, especially since we parse that PACKAGES.in file for the dashboard. I started out using CRANberries, because it already tracked it. We probably have to track it ourselves and diff it to get at what hour/minute an event takes place. We've also observed at least one case where they had a typo in the date; something like 21 instead of 12. That typo was fixed some day later.