Closed yenchiayi closed 4 years ago
Table 1:
The last chart doesn't look correct to me. Need to check.
I now use page_rank to rank the importance of packages in an identified community, instead of using evcent. Using evcent on a weighted directed graph may cause many problems, such as: https://lists.libreplanet.org/archive/html/igraph-help/2015-11/msg00020.html
After the modification, the top 3 packages of a community change slightly. I think the community names we came up with yesterday are still valid. Some are more clear, such as (comm_id: 25, comm_name: "Graph"), while some are a bit confused, as the similarity between id 42 and 44. I put my proposed revision in the column "suggested" below.
comm_id | n_mem | top | comm_name | suggested |
---|---|---|---|---|
6 | 5157 | methods, stats, MASS | base | |
4 | 4758 | testthat, knitr, rmarkdown | Rstudio | |
28 | 826 | Rcpp, tinytest, pinp | Rcpp | |
3 | 463 | survival, Formula, sandwich | Statistical Analysis | |
9 | 447 | nnet, rpart, randomForest | Machine Learning | |
16 | 367 | sp, rgdal, maptools | Geography 1 | Map |
15 | 131 | gsl, expint, mnormt | Geography 2 | Geography |
25 | 103 | graph, Rgraphviz, bnlearn | Bioconductor: Graph | Graph |
49 | 79 | tm, SnowballC, NLP | Text Analysis | |
42 | 55 | tcltk, tkrplot, tcltk2 | GUI | GUI 1 |
13 | 54 | rsp, listenv, globals | Infrastructure 1 | |
17 | 51 | polynom, magic, numbers | Numerical Optimization | |
40 | 43 | Biostrings, IRanges, S4Vectors | Bioconductor: Genomics | Genomics |
77 | 38 | RUnit, ADGofTest, fAsianOptions | RUnit | |
24 | 33 | kinship2, CompQuadForm, coxme | Survival Analysis | |
2 | 32 | slam, ROI, registry | Sparse Matrix | |
44 | 31 | RGtk2, gWidgetstcltk, gWidgetsRGtk2 | Infrastructure 2 | GUI 2 |
75 | 29 | limma, affy, marray | Bioinformatics | |
37 | 28 | RJSONIO, Rook, base64 | IO | |
45 | 27 | rJava, xlsxjars, openNLP | rJava |
Please kindly let me know your thoughts on whether to update the names. @chainsawriot @pymia
The current revision is as follows. Please refer to the original files in the folder visualization_community/.
@exilespacer I am okay with PageRank.
It seems to be clearer about what 15 is. It seems to be packages using GNU gsl library. These are more like statistical functions rather than geography.
@exilespacer @pymia I suddenly have an alternative suggestion about the figures and tables in the paper. (Maybe in the presentation/poster, we can still use the assigned labels.)
How about we don't name the communities ourselves, instead simply use the top 3 packages such as "methods, stats, MASS" as the label? It has already implied that this community is about "base".
It can prevent the possible objection by the reviewers about our labeling (e.g. the geography stuff).
If it's possible to make the label like: Rstudio: testthat, knitr, rmarkdown
?
For some communities like Base and Rstudio are easy to figure them out.
But for other more domain specific labels, readers still need to try to identify which community that limma, affy and marray belong to.
I agree with Mia's idea. My preference is as follows: LABEL: TOP 3 PKGS > LABLE > TOP 3 PKGS.
I actually used top3 packages as the community names before, and I think it is not effective in terms of information display. While it's true that we may label it not precisely, I found it hard to have a rough overview of the communities we identified, if only the top 3 packages are listed on the name. For example, if we label "studio," it immediately shows the popularity of the snack_name is likely because of the endorsement of the Rstudio community; if we don't, then we need further paragraphs in the main context to explain that. I would say better to make the figures and tables self-contained.
Though we find it hard to label, this difficulty would also apply to our readers, which I don't think they will google the description for each of the packages as we did and they may simply skip the section out of laziness to think. And even if they did, it is more efficient that we do it once and all the readers do not need to re-do the same things again and again.
I will suggest that we invite some people working with those packages to confirm if we make the label correctly. For example, I can post it on the FB pages of TW RUG and R-Ladies Taipei and collect some feedback. I knew there are some people from the field of Geographical data analysis and Genomics.
I opened a new issue for the labels of identified communities in #26