Closed DrMattG closed 4 years ago
As usual, there is no such thing as a new great idea - see the EMLassemblyline project for similar functionality as suggested here...
Great we can make use of that - it seems well maintained too which is useful. We can lean heavily on this and the other EML packages. R opensci have a couple of packages that do aspects of what we need too but we need to weave them all together in to a single (ish) workflow. Their EMLdown package is particularly cool (https://ropensci.org/blog/2017/08/01/emldown/ )
On Thursday, March 5, 2020, Anders G. Finstad notifications@github.com wrote:
As usual, there is no such thing as a new great idea - see the EMLassemblyline project https://github.com/EDIorg/EMLassemblylinehttps://github.com/EDIorg/EMLassemblyline for similar functionality as suggested here...
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LivingNorway/TheDataPackage/issues/3?email_source=notifications&email_token=ACYB2FI5OLUTWIEANV5D2CTRGADSRA5CNFSM4LCEL2A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN6Y7OI#issuecomment-595431353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYB2FNHEGJH4XFCSDYSYDLRGADSRANCNFSM4LCEL2AQ .
Suggest that as a first step, the "build_folder_structure" function creates basic folder structure and empty metadata files (that can be manually edited or edited by a call to a shiny app for graphical help). May streamline later, but this would the project up and running more quickly?
It would be useful to draw out a typical workflow as well - perhaps in a rmd-document? When starting a new (data) project, first set up folder structure, then metadata and DMP etc.
Yes, currently - this is indicated loosely in the metadata of the repro.
Would be nice to start drawing out the workflow explicitly.
I can add that as a template in the rmarkdown templates folder (it only has a minimum metadata template at the moment) - Perhaps so that it adds to the topmost level of the project folder a README/Instructions file
I was thinking of a working doc for us when we start building the functions - to see better how the functions / workflow we build fits together. But could be included as a template as well for the users to fill out - that could be a good idea actually.
Would a graph of the workflow be useful? I've been teaching myself graphing techniques to describe workflows in a few packages I'm developing. The graphs are rough, could be prettier but can be easily updated as they are created with a couple of tables that 1) describe the functions 2) describe from and to connections. Update the tables, and the graph automatically updates, no fiddling around with placing things in flowcharts. For example:
This looks indeed very useful!
@softloud what do you use to make these networks - is it igraph?
Ooh good question. I experimented with a few things.
This is a combination of tidygraph::
and ggraph::
packages, which extend from ggplot::
. I liked the set up for this. Create two dataframes: nodes and edges. In the nodes, any additional descriptors (in this case I wanted to differentiate between object types), and the edges describe from and to for the nodes. The edges need to match the node names or it'll bork out.
Nice thing is the graph layout is automated, more nodes can be added, more edges, etc., and the graph will update. ggplot::
syntax allows for different colouring, shapes, etc.
I recall I tried igraph::
but didn't find it as intuitive to set up.
library(tidygraph)
#>
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#>
#> filter
library(ggraph)
#> Loading required package: ggplot2
library(tidyverse)
nodes <-
tribble(
~object_name, ~object_type,
"raw_claim_data", "dataframe" ,
"preprocess_judgements", "function",
"aggregate_cs", "function",
"output", "dataframe",
"preprocess_QuizWAgg", "function",
"quizscores", "dataframe",
"preprocess_ReasonWAgg", "function",
"reasoning", "dataframe",
"raw_reasoning", "dataframe",
"priors", "dataframe",
"qualtrics_path", "filepath",
"get_quiz_scores", "function",
"quiz_scores", "dataframe",
"quiz_rubric", "dataframe"
) %>%
mutate(id = row_number())
node_key <- function(object_name) {
nodes %>%
dplyr::filter(object_name == !!object_name) %>%
pluck("id")
}
edges <-
tribble(
~from, ~to,
"raw_claim_data", "preprocess_judgements",
"preprocess_judgements", "aggregate_cs",
"aggregate_cs", "output",
"preprocess_QuizWAgg", "quizscores",
"reasoning", "aggregate_cs",
"quizscores", "aggregate_cs",
"raw_reasoning", "preprocess_ReasonWAgg",
"preprocess_ReasonWAgg", "reasoning",
"reasoning", "aggregate_cs",
"priors", "aggregate_cs",
"qualtrics_path", "get_quiz_scores",
"get_quiz_scores", "quiz_scores",
"quiz_scores", "preprocess_QuizWAgg",
"quiz_rubric", "preprocess_QuizWAgg"
) %>%
mutate(from = map_int(from, node_key),
to = map_int(to, node_key))
tbl_graph(nodes %>% select(-id), edges) %>%
ggraph() +
geom_edge_link(arrow = arrow(), colour = "lightgrey") +
geom_node_text(
size = 2.5,
aes(label = object_name, colour = object_type)) +
theme_graph() +
hrbrthemes::scale_color_ipsum() +
theme(legend.position = "bottom")
#> Using `sugiyama` as default layout
Created on 2020-05-04 by the reprex package (v0.3.0)
Build metadata file(s) - maybe we should think about the possibility for generating one plain text file for metadata (e.g. .md, .txt, or .rtf - I would personally prefer .md) and one meta.xml file in EML? Rational; The .xml file is machine readable and what goes into a DwC-A, but impossible for human consumption without translation. It would be cool to be able to build metadata based upon information in files (possible from templates coming from DataEntryForms) directly - would save some time indeed, but could come at some later stage.