chainsawriot / rstyle

The evolution of R programming styles.
43 stars 2 forks source link

Update tables and figures about community in the paper #17

Closed yenchiayi closed 4 years ago

yenchiayi commented 4 years ago
yenchiayi commented 4 years ago

Table 1: image

chainsawriot commented 4 years ago

The last chart doesn't look correct to me. Need to check.

yenchiayi commented 4 years ago

I now use page_rank to rank the importance of packages in an identified community, instead of using evcent. Using evcent on a weighted directed graph may cause many problems, such as: https://lists.libreplanet.org/archive/html/igraph-help/2015-11/msg00020.html

After the modification, the top 3 packages of a community change slightly. I think the community names we came up with yesterday are still valid. Some are more clear, such as (comm_id: 25, comm_name: "Graph"), while some are a bit confused, as the similarity between id 42 and 44. I put my proposed revision in the column "suggested" below.

comm_id n_mem top comm_name suggested
6 5157 methods, stats, MASS base
4 4758 testthat, knitr, rmarkdown Rstudio
28 826 Rcpp, tinytest, pinp Rcpp
3 463 survival, Formula, sandwich Statistical Analysis
9 447 nnet, rpart, randomForest Machine Learning
16 367 sp, rgdal, maptools Geography 1 Map
15 131 gsl, expint, mnormt Geography 2 Geography
25 103 graph, Rgraphviz, bnlearn Bioconductor: Graph Graph
49 79 tm, SnowballC, NLP Text Analysis
42 55 tcltk, tkrplot, tcltk2 GUI GUI 1
13 54 rsp, listenv, globals Infrastructure 1
17 51 polynom, magic, numbers Numerical Optimization
40 43 Biostrings, IRanges, S4Vectors Bioconductor: Genomics Genomics
77 38 RUnit, ADGofTest, fAsianOptions RUnit
24 33 kinship2, CompQuadForm, coxme Survival Analysis
2 32 slam, ROI, registry Sparse Matrix
44 31 RGtk2, gWidgetstcltk, gWidgetsRGtk2 Infrastructure 2 GUI 2
75 29 limma, affy, marray Bioinformatics
37 28 RJSONIO, Rook, base64 IO
45 27 rJava, xlsxjars, openNLP rJava

Please kindly let me know your thoughts on whether to update the names. @chainsawriot @pymia

yenchiayi commented 4 years ago

The current revision is as follows. Please refer to the original files in the folder visualization_community/. comm05_naming_among_community comm05_syntax_features_among_community comm05_feature_distance comm05_subgraph_selected_comm_9_45_49 comm05_naming_in_comm_9 comm05_naming_in_comm_45 comm05_naming_in_comm_49

chainsawriot commented 4 years ago

@exilespacer I am okay with PageRank.

It seems to be clearer about what 15 is. It seems to be packages using GNU gsl library. These are more like statistical functions rather than geography.

chainsawriot commented 4 years ago

@exilespacer @pymia I suddenly have an alternative suggestion about the figures and tables in the paper. (Maybe in the presentation/poster, we can still use the assigned labels.)

How about we don't name the communities ourselves, instead simply use the top 3 packages such as "methods, stats, MASS" as the label? It has already implied that this community is about "base".

It can prevent the possible objection by the reviewers about our labeling (e.g. the geography stuff).

pymia commented 4 years ago

If it's possible to make the label like: Rstudio: testthat, knitr, rmarkdown ? For some communities like Base and Rstudio are easy to figure them out. But for other more domain specific labels, readers still need to try to identify which community that limma, affy and marray belong to.

yenchiayi commented 4 years ago

I agree with Mia's idea. My preference is as follows: LABEL: TOP 3 PKGS > LABLE > TOP 3 PKGS.

I actually used top3 packages as the community names before, and I think it is not effective in terms of information display. While it's true that we may label it not precisely, I found it hard to have a rough overview of the communities we identified, if only the top 3 packages are listed on the name. For example, if we label "studio," it immediately shows the popularity of the snack_name is likely because of the endorsement of the Rstudio community; if we don't, then we need further paragraphs in the main context to explain that. I would say better to make the figures and tables self-contained.

Though we find it hard to label, this difficulty would also apply to our readers, which I don't think they will google the description for each of the packages as we did and they may simply skip the section out of laziness to think. And even if they did, it is more efficient that we do it once and all the readers do not need to re-do the same things again and again.

I will suggest that we invite some people working with those packages to confirm if we make the label correctly. For example, I can post it on the FB pages of TW RUG and R-Ladies Taipei and collect some feedback. I knew there are some people from the field of Geographical data analysis and Genomics.

yenchiayi commented 4 years ago

I opened a new issue for the labels of identified communities in #26