acrossthetidyverse / tidycast

A survey of the Tidyverse packages :star:s & CRAN downloads with forecasts
1 stars 0 forks source link

seperate out cols? #3

Open njtierney opened 7 years ago

njtierney commented 7 years ago

Perhaps it might be good to separate out the package column into package and author?

Also, visualising this is harder than I thought.

library(readr)
tidyverse_gh_stars <- read_csv("~/Google Drive/ALL THE THINGS/PhD/code/R/acrossthetidyverse-projs/tidycast/data/tidyverse_gh_stars.csv")
#> Parsed with column specification:
#> cols(
#>   date = col_date(format = ""),
#>   n_stars = col_integer(),
#>   package = col_character()
#> )

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats

tidy_star <- 
tidyverse_gh_stars %>%
    separate(col = package,
             into = c("author", "package"),
             sep = "/")

# prelim visualisation

    p1 <- ggplot(tidy_star,
                 aes(x = date,
                     y = n_stars,
                     colour = package)) +
    geom_point(alpha = 0.5) + 
    geom_line(alpha = 0.5) +
    facet_wrap(~author,
               scale = "free_y",
               ncol = 1)

p1


p1 + scale_y_log10()


plotly::ggplotly(p1)


# OK, so visualising this is harder than I thought.
MilesMcBain commented 7 years ago

Haven't looked at data yet, so keen though, thanks @maelle! What would help I think is shifting them all to the same relative timescale so they all start at t0. So we're going to need the date published. Or as a proxy we could use first star.

maelle commented 7 years ago

Yeah it's hard to compare them!

I added the CRAN downloads, took 5 minutes :grin: I used the minimal date from the Github stars as minimum date for all CRAN downloads.

I'm not sure how to get date published, maybe from Github API, maybe from the first timepoint at which the package has more than 0 CRAN downloads from RStudio... Not from CRAN page for the package anyway, since it gives the date of the latest version. But I'll let you think about this :wink:

maelle commented 7 years ago

Also I know we want to look into forecasting but my passion for aberration detection makes me wonder if we could link peaks in no. of stars / downloads to something (I guess releases, new versions of R). :angel: But this might be out of scope for a post about forecasting popularity.