Closed antony-wilson closed 1 year ago
SonarCloud Quality Gate failed.
0 Bugs
0 Vulnerabilities
2 Security Hotspots
5 Code Smells
No Coverage information
0.0% Duplication
The submission script and config file are now included in the zip file:
├── inputs
│ ├── data
│ │ └── 72e13dceb2a924f0babad5e1920b3191af0ebe50.csv
│ ├── model_config
│ │ └── config.yaml
│ └── submission_script
│ └── c2351d9bb49857728421e9344d88a45f9e88e835.toml
├── outputs
│ ├── a5ffd3479af8e37f9ea128a36b5aeb75240d1160.pdf
│ └── c2351d9bb49857728421e9344d88a45f9e88e835.toml
└── ro-crate-metadata.json
@RyanJField was there anything else to include?
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
The inclusion of author/1 may be a bug. I think the reference should point to "@id": "https://orcid.org/000-0000-0000-0000",
The inclusion of author/1 may be a bug. I think the reference should point to
"@id": "https://orcid.org/000-0000-0000-0000",
The inclusion of author/1 may be a bug. I think the reference should point to
"@id": "https://orcid.org/000-0000-0000-0000",
This is now the json I get:
{
"@context": [
"https://w3id.org/ro/crate/1.1/context",
{
"sha1": "http://xmlns.com/foaf/0.1/#term_sha1"
}
],
"@graph": [
{
"@id": "./",
"@type": "Dataset",
"datePublished": "2022-12-13T11:19:29.164067",
"hasPart": [
{
"@id": "outputs/14da2266b09360aa5cd36a9501a079aac9538634.png"
},
{
"@id": "inputs/model_config/config.yaml"
},
{
"@id": "inputs/submission_script/script.sh"
},
{
"@id": "inputs/data/1.0.0.csv"
},
{
"@id": "https://doi.org/10.1038/s41592-020-0856-2"
}
],
"license": {
"@id": "https://creativecommons.org/licenses/by/4.0/"
},
"name": "RO Crate for SEIRS_model/results/figure/python",
"publisher": "FAIR Data Pipeline"
},
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"about": {
"@id": "./"
},
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.1"
},
"license": {
"@id": "https://creativecommons.org/publicdomain/zero/1.0/"
}
},
{
"@id": "https://creativecommons.org/licenses/by/4.0/",
"@type": "CreativeWork",
"description": "Attribution 4.0 International",
"identifier": "https://creativecommons.org/licenses/by/4.0/",
"name": "CC BY 4.0"
},
{
"@id": "https://creativecommons.org/publicdomain/zero/1.0/",
"@type": "CreativeWork",
"description": "CC0 1.0 Universal (CC0 1.0) Public Domain Dedication",
"identifier": "https://creativecommons.org/publicdomain/zero/1.0/",
"name": "CC0 Public Domain Dedication"
},
{
"@id": "outputs/14da2266b09360aa5cd36a9501a079aac9538634.png",
"@type": "File",
"author": [
{
"@id": "https://orcid.org/000-0000-0000-0000"
}
],
"description": "SEIRS output plot",
"encodingFormat": "image/png",
"name": "SEIRS_model/results/figure/python",
"sha1": "14da2266b09360aa5cd36a9501a079aac9538634"
},
{
"@id": "https://orcid.org/000-0000-0000-0000",
"@type": "Person",
"name": "Interface Test"
},
{
"@id": "http://127.0.0.1:8000/api/code_run/1",
"@type": "CreateAction",
"agent": {
"@id": "https://orcid.org/000-0000-0000-0000"
},
"description": "SEIRS Model python",
"instrument": {
"@id": "https://github.com/https://github.com/FAIRDataPipeline/pySimpleModel"
},
"name": "code run 1",
"object": [
{
"@id": "inputs/model_config/config.yaml"
},
{
"@id": "inputs/submission_script/script.sh"
},
{
"@id": "inputs/data/1.0.0.csv"
}
],
"result": {
"@id": "outputs/14da2266b09360aa5cd36a9501a079aac9538634.png"
},
"startTime": "2022-12-13T11:17:39.498545+00:00"
},
{
"@id": "https://github.com/https://github.com/FAIRDataPipeline/pySimpleModel",
"@type": "SoftwareApplication",
"author": [
{
"@id": "https://orcid.org/000-0000-0000-0000"
}
],
"url": "https://github.com/https://github.com/FAIRDataPipeline/pySimpleModel"
},
{
"@id": "inputs/model_config/config.yaml",
"@type": [
"File",
"SoftwareSourceCode"
],
"author": [
{
"@id": "https://orcid.org/000-0000-0000-0000"
}
],
"description": "Working config.yaml location in datastore",
"encodingFormat": "yaml",
"name": "config.yaml",
"sha1": "a010936d503444515e625cc4c7c5c842d031d9aa"
},
{
"@id": "inputs/submission_script/script.sh",
"@type": [
"File",
"SoftwareSourceCode"
],
"author": [
{
"@id": "https://orcid.org/000-0000-0000-0000"
}
],
"description": "Working script location in datastore",
"encodingFormat": "text/x-sh",
"name": "script.sh",
"sha1": "f35c1cd83fbe1a458d71da1aae90ed2e8db2b031"
},
{
"@id": "inputs/data/1.0.0.csv",
"@type": "File",
"author": [
{
"@id": "https://orcid.org/000-0000-0000-0000"
}
],
"description": "Static parameters of the model",
"encodingFormat": "text/csv",
"name": "SEIRS_model/parameters",
"sha1": "6294a5951677e6b8438cabf55234b7974adeaee3"
},
{
"@id": "https://doi.org/10.1038/s41592-020-0856-2",
"@type": "File",
"datePublished": "2021-09-20 12:00:00+00:00",
"name": "Static parameters of the model"
},
{
"@id": "http://127.0.0.1:8000/api/data_extraction/1",
"@type": "CreateAction",
"description": "import/extract data from an external source",
"name": "data extraction 1",
"object": {
"@id": "https://doi.org/10.1038/s41592-020-0856-2"
},
"result": {
"@id": "inputs/data/1.0.0.csv"
},
"startTime": "2022-12-13T11:17:32.616245+00:00"
}
]
}
The Orcid ID is okay if there is one, but it's an optional field, it might need to fall back to the url if no Orcid ID is set?
The person's @id
does not necessarily have to be an ORCID url. You can use an internal, possibly randomly generated identifier, e.g.:
{
"@id": "#3b6dd3e2-12f8-428f-833c-fa4314c9ae50",
"@type": "Person",
"name": "Interface Test"
},
ro-crate-py automatically generates a random identifier if you don't specify one:
p = crate.add(Person(crate, properties={"name": "Interface Test"}))
In cases like this where there is no actual agent, of course, one can simply not add the agent
property to the CreateAction
altogether (it's not required).
Regarding properties for file checksums, they're not in the standard RO-Crate context. We're going to add them to the workflow-run
ro-terms namespace soon though, see https://github.com/ResearchObject/ro-terms/issues/14. When that is done, crate authors will be able to use those properties by adding them as an extension to the context. E.g.:
EXTRA_TERMS = {
"sha1": "https://w3id.org/ro/terms/workflow-run#sha1"
}
crate.metadata.extra_terms.update(EXTRA_TERMS)
Finally, note that the data extraction action is missing the instrument
. It needs to point to the relevant software tool, like the code run action is doing.
@RyanJField do we have anything that we could use as instrument
to say how the DataProduct
was extracted from the ExternalObject
?
@RyanJField do we have anything that we could use as
instrument
to say how theDataProduct
was extracted from theExternalObject
?
No the pull command does not push a code run. I can add the functionality to do so. @richardreeve should the pull command push a code run to the registry, and should it only do so in the case of an external object?
Apologies for the delay here. There are two cases where we “convert” an external object into a data product.
In one case, the external object is the data product, so I’m not sure there is an instrument? The two are just the same entity - we can see this internally, because the external object is tagged as primary.
In the second case, the external object is tagged as supplementary internally, and then there is some unknown additional process involved in converting the external object into a data product. Often this is something like just retyping a table from a paper by hand, but in any case we do not record what is done. In that case, @simleo I’m not sure what a suitable generic instrument would be?
I don’t think I either case the code run would be appropriate, because there is never anything “run” to do the work, but I’m not sure what else we could say?
Edit: the only instrument I can think if that is generic enough is maybe something really unhelpful like a computer?
On the id front, it seems like we could just do as suggested for empty ORCIDs and replace any individual anonymous author with a randomly generated local id, so it’s the same id through the RO Crate - would that be possible @antony-wilson? That was we can distinguish different anonymous authors from one another.
Apologies for the delay here. There are two cases where we “convert” an external object into a data product.
- In one case, the external object is the data product, so I’m not sure there is an instrument? The two are just the same entity - we can see this internally, because the external object is tagged as primary.
- In the second case, the external object is tagged as supplementary internally, and then there is some unknown additional process involved in converting the external object into a data product. Often this is something like just retyping a table from a paper by hand, but in any case we do not record what is done. In that case, @simleo I’m not sure what a suitable genetic instrument would be?
I don’t think I either case the code run would be appropriate, because there is never anything “run” to do the work, but I’m not sure what else we could say?
I would suggest that the instrument is the CLI, as the CLI downloads the file and renames it... when pull is called the CLI generates a Job ID (based on the time and date) and config.yaml file for the pull.
The process of going from a external object to an internal object is an activity
and I've name it data_extraction
.
Re @richardreeve 2 cases for external objects
Case one, agreed, there is no need for a data_extraction
activity
Case two, we have a data_extraction
activity
that makes use of an instrument
, from @RyanJField comments it sounds like that the instrument
is the CLI
I agree it is the CLI that is physically moving the file from the remote store to the local one, but this is really about moving the data from the external source to the internal data product isn’t it? That isn’t being done by the CLI - it’s being done by a human somehow through an unknown instrument
. I think the most we can say is that it’s a computer or something equally generic, surely?
In any event, are adding a computer instrument and a local id to replace the zeroed orcids both things we could potentially do this week so we could merge this PR and then raise an issue if we think it’s wrong later?
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
The ROCrate is now using the CLI as the instrument
for data_extraction
activity
s
Currently if there is no orchid id the code falls back to the local id i.e. http://127.0.0.1:8000/api/author/5
. Hopefully this is sufficient for now and I'll add the random id stuff in the New Year.
Okay, if that’s the only thing, shall we merge this then and raise an issue about the non-uniqueness of the id? And do we need to do the same for the file checksums, or can we fix that now?
I think merge now and update the check sum when https://github.com/ResearchObject/ro-terms/issues/14 is released
I think merge now and update the check sum when ResearchObject/ro-terms#14 is released
I agree this can now be merged.
Okay, do one of you want to merge it then? I'm happy.
In principle, an instrument can be anything (the expected value type is Thing), including a computer. However, in the context of software execution, it should point to a specific application, even if it's just a basic one for data transfer (e.g., cp
, curl
, ...). So I think that pointing to the CLI like you did was the right decision.
I've addressed https://github.com/ResearchObject/ro-terms/issues/14 in https://github.com/ResearchObject/ro-terms/pull/15. To avoid bloating the namespace, for now I've added only md5
, sha1
, sha256
and sha512
. If you need some other variant, just open another issue.
By the way, the current draft of the Workflow Run RO-Crate profiles is now nicely formatted at https://www.researchobject.org/workflow-run-crate/profiles/ :slightly_smiling_face:
Codecov Report
87.94% <93.79%> (+0.97%)
Flags with carried forward coverage won't be shown. Click here to find out more.
95.23% <ø> (ø)
75.55% <66.00%> (-2.18%)
95.52% <83.33%> (-2.66%)
97.14% <97.14%> (ø)
84.06% <100.00%> (+0.86%)
95.61% <100.00%> (+1.71%)
100.00% <100.00%> (ø)
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.