Open dgarijo opened 11 months ago
I have to confess it's not really that good yet. However, my attention during the ELIXIR BioHackathon has already been requested for other tasks... hope to have an update soon!
no rush!
Back in business! Below, I'll add things I noticed while creating the demo repo. As I won't be done today, more might follow...
acknowledgment
is missing from the SOMEF README and docsimage
(singular) in the SOMEF output, but the docs say images
(plural)Thanks, let me open this in a new issue. Many people have been editing the d ocs, and keeping everything consistent can be challenging
Having too many contributors sounds like a lovely problem to have :). Below, I got two more potential issues... I'll post them in separate comments
I think there might be an issue with extracting a logo when there is no slash (/
) in the path to the logo. For illustration, below is a snippet of the README.md of the somef-demo-repo, followed by a snippet of the JSON output of SOMEF. Note that logo1.png
is not recognized as a logo, but logo_directory/logo2.png
is. Same result if I use logo.png
and if I don't have the logo_directory/logo2.png
in the README.md
# Image
Images used to illustrate the software component.
![logo1.png](logo1.png)
# Logo
Main logo used to represent the target software component.
![logo2.png](logo_directory/logo2.png)
"logo": [
{
"result": {
"type": "Url",
"value": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/logo_directory/logo2.png"
},
"confidence": 1,
"technique": "regular_expression",
"source": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/README.md"
}
],
"image": [
{
"result": {
"type": "Url",
"value": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/logo1.png"
},
"confidence": 1,
"technique": "regular_expression",
"source": "https://raw.githubusercontent.com/tpronk/somef-demo-repo/main/README.md"
}
]
At the Hackathon, we've been extracting metadata from around 65 repos, but in none of the SOMEF output can I find the field has_executable_notebook
. Also, in the SOMEF source code, I couldn't easily identify any snippets that extract it. Does this field still work? If so, might you have an example for me of a repo where it can be extracted from?
I found a case where values extracted for the invocation
field were attributed to README.md
, but on visual inspection, I found them in README.Rmd
instead. It concerns this repo. Below is a snippet of the SOMEF output. Credits to Esteban for providing this dataset :)
"invocation": [
{
"result": {
"type": "Text_excerpt",
"value": "\n```{r, echo=FALSE, results='asis', message = FALSE}\nmy_apc %>% select(institution, euro) %>% \n group_by(institution) %>% \n ezsummary::ezsummary(n = TRUE, digits= 0, median = TRUE,\n extra = c(\n sum = \"sum(., na.rm = TRUE)\",\n min = \"min(., na.rm = TRUE)\",\n max = \"max(., na.rm = TRUE)\"\n )) %>%\n mutate_all(format, big.mark=',') %>%\n ezsummary::ezmarkup('...[. (.)]..[. - .]') %>%\n#> get rid of blanks\n mutate(`mean (sd)` = gsub(\"\\\\( \", \"(\", .$`mean (sd)`)) %>% \n select(institution, n, sum, `mean (sd)`, median, `min - max`) %>%\n arrange(desc(n)) %>%\n knitr::kable(col.names = c(\"Institution\", \"Articles\", \"Spending total (in \u20ac)\", \"Mean (SD)\", \"Median\", \"Minimum - Maximum\"), align = c(\"l\",\"r\", \"r\", \"r\", \"r\", \"r\"))\n``` \n",
"original_header": "Fully Open Access Journals"
},
"confidence": 0.906763643352601,
"technique": "supervised_classification",
"source": "https://raw.githubusercontent.com/MPDL/unibiAPC/master/README.md"
},
{
"result": {
"type": "Text_excerpt",
"value": "```{r, echo = FALSE, warning = TRUE}\n\nknitr::opts_knit$set(base.url = \"/\")\nknitr::opts_chunk$set(\n comment = \"#>\",\n collapse = TRUE,\n warning = FALSE,\n message = FALSE,\n echo = FALSE,\n fig.width = 9,\n fig.height = 6\n)\noptions(scipen = 999, digits = 0, tibble.width = Inf, tibble.print_max = Inf)\n\nknitr::knit_hooks$set(inline = function(x) {\n prettyNum(x, big.mark = \",\")\n})\n```\n```{r}\nrequire(dplyr)\nrequire(ggplot2)\nrequire(ezsummary)\nrequire(pander)\n```\n```{r, echo=FALSE, cache = FALSE}\nmy_apc <- readr::read_csv(\"data/apc_de.csv\")\n```\n \n"
},
"confidence": 0.9211067534061969,
"technique": "supervised_classification",
"source": "https://raw.githubusercontent.com/MPDL/unibiAPC/master/README.md"
}
]
Thanks for these issues. executable_notebook
should return the my binder links. I see that now these are added in executable_example. This may need a review (the schema suffered a few changes).
All other issues are legit. Thanks a lot! We'll need to address them
If you find any more, please open them! I usually open them as I test in diverse repos, but some time is tricky getting to these edge cases
Bueno & gracias. I'll keep 'em coming then :)
Wrapping things up, I compared fields mentioned in the README.md of SOMEF to the fields in constants.py. These are the discrepancies I found in terms of entries I couldn't find in one or the other, ignoring cases where they probably just have a different name
changelog
. Yes in README, not in constantscode_repository
. Not in README, yes in constantscontributing_guidelines
. Not in README, yes in constantsdate_created
. Not in README, yes in constantsdate_updated
. Not in README, yes in constantsAll right then. SOMEF 0.9.4 can extract a total of 48 fields from this version of somef-demo-repo, which can make it a nice integration test I guess
Definitely. Thanks!!
This repository: https://github.com/tpronk/somef-demo-repo should be added in the documentation