Closed ypriverol closed 3 years ago
You can eventually link to other initiatives such as omicstools (https://omictools.com):
LABEL OMICSTOOLS="https://omictools.com/comet-3-tool"
@mr-c Recomendations http://label-schema.org/rc1/
to get inline with others labels, shouldn't be lowercase ? (LABEL biotools=...)
Should be the identifiers encoded in the Dockerfile/Conda recipes? I will list the pros and cons and we can take a decision to move this forward.
Pros:
Cons:
Inject an external source of information into the technical recipe with the corresponding disadvantages on that: 1.1. Updates in the identifiers will trigger updates in our recipes, which actually is not a good practice because this is not a technical change. 1.2. Identifiers and external sources tends to change more often that other things due major reasons: a) resources change their identifier schema, b) resources disappear or new resources are added, then new identifiers need to be added.
Most of the recipes will be post-annotated. That means that we will have first the recipe and after a process of creation of pubmed or bio.tools, then we will annotate our recipe.
I have been looking into other resources such as bioconductor that has a similar problem because they have the description of the package and them an annotation where they add the corresponding information such as publication and external urls. My recommendation, for now, is to have a central place in biocontainers, where we can store this information persistence in github, for example:
biocontainers_bioconda_id external_id
Here a full example (https://github.com/BioContainers/biotools-bioconda-ids/blob/master/mapping.csv). When a package is created we ask to add the package to this file and then we do a PR. This is similar to what we created at the very beginning with mulled. With this approach, we can even update this table without updating the recipe and the corresponding image. Also, we can call for contributors to update this metadata through PRs.
Call for comments: @osallou @bgruening @prvst @BioContainers/contributors @johanneskoester @jmchilton
Inject an external source of information into the technical recipe with the corresponding disadvantages on that: 1.1. Updates in the identifiers will trigger updates in our recipes, which actually is not a good practice because this is not a technical change.
I don't think we need to rebuild the package in such cases iff the consumers of such metadata rely on the yaml files inside the github repo.
1.2. Identifiers and external sources tends to change more often that other things due major reasons: a) resources change their identifier schema, b) resources disappear or new resources are added, then new identifiers need to be added.
I hope that this does not happen this much. The idea of an ID is that it stays like a doi I hope.
Most of the recipes will be post-annotated. That means that we will have first the recipe and after a process of creation of pubmed or bio.tools, then we will annotate our recipe.
Why is this a cons with regard to the question?
As I said Bioconda will most likely start to annotate tools with DOI very soon in the main meta.yaml file, so we can jump on this and add bio.tools IDs as well.
Biotools ids and others should indeed be fixed in time, with only a few exceptions. That the goal of ids. Managing an other file with other PR may lead to file being updated only in a few cases, people forgetting to do so.
Le lun. 12 févr. 2018 13:20, Björn Grüning notifications@github.com a écrit :
Inject an external source of information into the technical recipe with the corresponding disadvantages on that: 1.1. Updates in the identifiers will trigger updates in our recipes, which actually is not a good practice because this is not a technical change.
I don't think we need to rebuild the package in such cases iff the consumers of such metadata rely on the yaml files inside the github repo.
1.2. Identifiers and external sources tends to change more often that other things due major reasons: a) resources change their identifier schema, b) resources disappear or new resources are added, then new identifiers need to be added.
I hope that this does not happen this much. The idea of an ID is that it stays like a doi I hope.
Most of the recipes will be post-annotated. That means that we will have first the recipe and after a process of creation of pubmed or bio.tools, then we will annotate our recipe.
Why is this a cons with regard to the question?
As I said Bioconda will most likely start to annotate tools with DOI very soon in the main meta.yaml file, so we can jump on this and add bio.tools IDs as well.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BioContainers/specs/issues/84#issuecomment-364907187, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-gYgjLR1sOj0HK9bT9YYEcP5g6JoHQks5tUCyYgaJpZM4Pyot0 .
on IDs, I do agree with @bgruening and @osallou : bio.tools identifiers are now persistent IDs. If they are not persistent, it would be better to not provide any ;)
As for the syntax through labels, maybe a distinction between links and ids would be nice, like
LABEL tool_id="bio.tools:comet" (or "https://bio.tools/comet" if URL is preferred)
LABEL training_link="https://www.ebi.ac.uk/training/online/course/phenomenal-accessing-metabolomics-workflows-galaxy"
The point is I am not sure to which extent TESS commits to persistent URLs for instance. Whereas bio.tools surely does.
on Ids too, When an author puts a new Dockerfile in Biocontainers :
If the tool is not referenced in bio.tools, does he need to create an entry with ID in bio.tools before/after ?
What happens if this is not done ? Do we accept submissions without ID ? Do We need to add it for the user ?
Cheers,
Francois
On Mon, Feb 12, 2018 at 3:28 PM, Hervé Ménager notifications@github.com wrote:
on IDs, I do agree with @bgruening https://github.com/bgruening and @osallou https://github.com/osallou : bio.tools identifiers are now persistent IDs. If they are not persistent, it would be better to not provide any ;)
As for the syntax through labels, maybe a distinction between links and ids would be nice, like
LABEL tool_id="bio.tools:comet" (or "https://bio.tools/comet" if URL is preferred) LABEL training_link="https://www.ebi.ac.uk/training/online/course/phenomenal-accessing-metabolomics-workflows-galaxy"
The point is I am not sure to which extent TESS commits to persistent URLs for instance. Whereas bio.tools surely does.
— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/BioContainers/specs/issues/84#issuecomment-364938114, or mute the thread https://github.com/notifications/unsubscribe-auth/AOWcQC4V2poKuGn6bDXmZV6AH75GMIz_ks5tUEqNgaJpZM4Pyot0 .
We have to define two things here:
1- The way we want to make persistent the id to bio.tools. Different approaches can be implemented: a) we can hard code the ids with some structure in the recipes bioconda/biocontainers. b) add an extra file with the recipe called identifiers.yml (or similar) where we can encode the identifiers. c) separated file when we perform the match between both lists. 2 - The second question is what do we do if the information is not available. We shouldn't force both open source communities to way for bio.tools to release the tool. Howeve, bio.tools can think in a the way that we can implement to request the creation of a tool on demand after. Probably @hmenager has an idea how to do this.
Regards ...
We should focus now the discussion around how to solve problem one. Where to put the identifiers and how in both initiatives Biocontainers Dockerfile and BioConda recipes.
1a
for me and not caring much about 2 for the moment. Time will tell if people will update there own recipes.
I think 1.a is better (simplicity).
On Mon, Feb 12, 2018 at 5:36 PM, Björn Grüning notifications@github.com wrote:
1a for me and not caring much about 2 for the moment. Time will tell if people will update there own recipes.
— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/BioContainers/specs/issues/84#issuecomment-364980469, or mute the thread https://github.com/notifications/unsubscribe-auth/AOWcQOioj2SyzXRmfb5qkP0vJSLN61F-ks5tUGh6gaJpZM4Pyot0 .
chaps, I just chip in on a couple of points:
All for now - happy hacking! :)
Hi all:
As we all agree we go for option 1a : Each recipe will be self-described. This means that anyone can take those recipes and find the external reference information.
That means that we will put inside each recipe in bioconda \ biocontainers the id of bio.tools. Please let me know with your (+1) in this comment if you agree on this. @osallou @bgruening @prvst @BioContainers/contributors @johanneskoester @jmchilton @joncison @fjrmoreews
+1
Le mer. 14 févr. 2018 17:02, Yasset Perez-Riverol notifications@github.com a écrit :
Hi all:
As we all agree we go for option 1a :Each recipe will be self-described. This means that anyone can take those recipes and find the external reference information.` That means that we will put inside each recipe in bioconda \ biocontainers the id of bio.tools. Please let me know with your (+1) in this comment if you agree on this. @osallou https://github.com/osallou @bgruening https://github.com/bgruening @prvst https://github.com/prvst @BioContainers/contributors https://github.com/orgs/BioContainers/teams/contributors @johanneskoester https://github.com/johanneskoester @jmchilton https://github.com/jmchilton @joncison https://github.com/joncison @fjrmoreews https://github.com/fjrmoreews
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BioContainers/specs/issues/84#issuecomment-365653866, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-gYugQYM3UodyD4h_m_Kicq8GFkk2Aks5tUwOkgaJpZM4Pyot0 .
@ypriverol +1
Thanks everyone for your vote on the previous comment. We have decided to encode inside the recipe in Conda and BioContainers Dockerfile the identifiers from bio.tools. Here the different options to encode:
LABEL extra.identifier = biotools:abyss
LABEL extra.identifier = doi:10.1021/ac303239g
LABEL extra.identifier = http://bio.tools/abyss
LABEL extra.identifier = https://pubs.acs.org/doi/10.1021/ac303239g
extra:
identifiers:
- biotools:abyss
- doi:10.1021/ac303239g
- pmid: 23448308
extra:
identifiers:
biotools:
- http://bio.tools/abyss
- https://pubs.acs.org/doi/10.1021/ac303239g
It is important to define how this will be represented in both sides. I have linked this comment to an issue in the bioconda community. https://github.com/bioconda/bioconda-recipes/issues/7699 .
Would it hurt to give both?
extra:
identifiers:
biotools:
- biotools:abyss
- https://bio.tools/abyss
...
It will not hurt but we should put some standardization for readers to be able to read the files and interpret them. One option could be:
extra:
identifiers:
- biotools:abyss
- http://bio.tools/abyss
With this approach, people will know that you can have both compact identifiers and complete URLs.
id is indeed better than URL, which may change in time...
LABEL keys must however be unique, cannot define multiple ones like:
LABEL extra.identifier = biotools:abyss
LABEL extra.identifier = doi:10.1021/ac303239g
so should be like:
LABEL extra.identifier.biotools=abyss
LABEL extra.identifier.doi=10.1021/ac303239g
in which case you could (if you want both flavours) I guess have:
LABEL extra.identifier.biotools=abyss
LABEL extra.identifier.biotoolsurl=https://bio.tools/abyss
@osallou In DockerFile you can define a label multiple times as far as I know. Then, this is allowed:
LABEL extra.identifier = biotools:abyss
LABEL extra.identifier = doi:10.1021/ac303239g
This is only saying that we have two identifiers for the tool. We can explore other options like , separated values:
LABEL extra.identifier = biotools:abyss, doi:10.1021/ac303239g
@joncison
I don't like to encode in the label/name of the property the domain of identifier because this can open the space to multiple errors.
A label is a key-value pair, stored as a string. You can specify multiple labels for an object, but each key-value pair must be unique within an object. If the same key is given multiple values, the most-recently-written value overwrites all previous values. ( https://docs.docker.com/config/labels-custom-metadata/)
Le jeu. 15 févr. 2018 à 14:25, Yasset Perez-Riverol < notifications@github.com> a écrit :
@osallou https://github.com/osallou In DockerFile you can define a label multiple times as far as I know. Then, this is allowed:
LABEL extra.identifier = biotools:abyssLABEL extra.identifier = doi:10.1021/ac303239g
This is only saying that we have two identifiers for the tool. We can explore other options like , separated values:
LABEL extra.identifier = biotools:abyss, doi:10.1021/ac303239g
@joncison https://github.com/joncison
I don't like to encode in the label/name of the property the domain of identifier because this can open the space to multiple errors.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BioContainers/specs/issues/84#issuecomment-365926247, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-gYp-ieZTqb0L8HVhJgb90_Vgfllm1ks5tVDBUgaJpZM4Pyot0 .
Thanks @osallou for sharing this. I have been reading the documentation and it looks like the option is the following for Dockerfiles:
LABEL extra.identifier.biotools=abyss
LABEL extra.identifier.doi=10.1021/ac303239g
I like the:
extra:
identifiers:
- biotools:abyss
- doi:10.1021/ac303239g
- http://bio.tools/abyss
Hi, all the final decision for the external identifiers is the following:
For the bioconda recipe would be like:
extra:
identifiers:
- biotools:abyss
- doi:10.1021/ac303239g
- http://bio.tools/abyss
and the dockerfile would be like the following:
LABEL extra.identifier.biotools=abyss
LABEL extra.identifier.doi=10.1021/ac303239g
Thanks to everyone for their contribution to this discussion.
Thank you for the talk last Monday! I don't know where this discussion moved to, so let me know if you want me to repost this somewhere else.
The ELIXIR Aarhus team has merged our own mapping with the mapping linked to in the beginning of this issue. It is shared here:
https://docs.google.com/spreadsheets/d/1kSBnt6CKG53mqsltTzA-gCFdEuBSqf-UQwQoukD3lwM/edit?usp=sharing
As @joncison requested, the mapping also contains entries for tools which have a Bioconda package, but not an entry in bio.tools.
You guys should be able to edit the mapping, so we can collaborate on it. Or do you want to do it in some other way? @ypriverol, @hmenager?
@dansondergaard :
As we agree, I did a PR to conda group to agree with them on the structure https://github.com/bioconda/bioconda-recipes/pull/7940 . The PR is now under consideration and I hope we can have an agreement and accept the PR by the end of the week.
About the mapping list. First thanks a lot for this great work. @hmenager and myself have created a repo (https://github.com/BioContainers/biotools-bioconda-ids) in git with the list of containers and the corresponding tool in biotools if is available.
These are the files:
https://github.com/BioContainers/biotools-bioconda-ids/blob/master/mapping_matchonly.csv This is the actual mapping between tools, we have found so far. I really think we should update that file using your file for the tools that match.
https://github.com/BioContainers/biotools-bioconda-ids/blob/master/mapping.csv These are all the toolsm, the one that maps and the ones that do not map. Please add here the tools that you have in both sides and do not map.
My idea is to build a simple script when the BioConda team accept the structure I have proposed to annotate all the tools in the matching list. I have already the code in place for that.
What do you think ? @dansondergaard
@ypriverol Sounds good. I know about the repo (since we used it to merge your mapping with ours to obtain a more complete mapping).
However, we'd like to keep working on the mapping, possibly in collaboration with you guys, which I think may be easier to do in Google Docs (otherwise there'll probably be a lot of merge conflicts). But if you prefer to collaborate via Git that's fine too :-)
Hi @dansondergaard It would be great if everything is on github, for example, yesterday Dimitri did a PR with some updates after a manual curation of the web in biotools. We can merge both lists here in github and add the extra column verification_status
. If you feel comfortable with this idea, please make a PR and I will accept it.
Working on GitHub is fine. We'll attempt a PR soon.
Just another question. In your mapping, what does "null" mean? Does it mean:
Our worry is that we're going to do a lot of extra manual work checking "null" entries, if you guys already did it. If we could distinguish "verified not in bio.tools" and "did not map automatically", @joncison would also be able to create the missing bio.tools entries.
yup ... above would make life much easier. @hansioan and I can help out with 1. and 2. above: heads-up Hans, there will be a curation task in due course, although we'll be expecting to get at least some boilerplate metadata from the BioContainers side.
For each BioContainers / BioConda package we should annotate the bio.tools identifier in the way:
LABEL BIOTOOLS="https://bio.tools/comet"
This will be available in the metadata and Biocontiners API would be able to retrieve this information from the Bio.tools.
If we have a material that are in TESS, we should have a system such as the one discussed in https://github.com/BioContainers/specs/issues/78:
LABEL TESS="https://www.ebi.ac.uk/training/online/course/phenomenal-accessing-metabolomics-workflows-galaxy"