calipho-sib / cellosaurus

A knowledge resource on cell lines - From SIB CALIPHO group
https://www.cellosaurus.org
Creative Commons Attribution 4.0 International
13 stars 0 forks source link

Provide a mechanism for users to submit or link to relevant annotations? #6

Closed khughitt closed 4 years ago

khughitt commented 4 years ago

This is a bigger question, and it depends on what you have in mind for the scope of cellosaurus, but, I was thinking it could be useful to provide some mechanism for users to add / suggest additional annotations.

For example, in the Multiple Myeloma field, a researcher has compiled a really useful table of annotations for ~70 different MM cell lines.

A simple approach would just be to provide a mechanism for users to suggest a link such as this be included for each relevant cell line (e.g. "external information", "additional resources", etc.)

Somewhat more involved, would be to try and directly integrate annotations like these and include it in some section of the cell line records designed for arbitrary additional information.

Just a thought.

AmosBairoch commented 4 years ago

Hi again. Many thanks for all your thoughts and comments as they already prompted me to make updates and write a new FAQ!

This is a very complex issue: the Cellosaurus is not a repository but a curated knowledgebase, it seeks data submission and process them manually. Creating an automatic system to allow users to input data is neither desirable nor easy to develop. Fifteen years of monitoring experiment to use wikis to do community annotions (see https://pubmed.ncbi.nlm.nih.gov/18507872) leave me with a very pessimistic view of such processes. Having said that, community efforts in the scientific realm do works for projects that requires very precisely defined "atoms" of knowledge such as those provided in Wikidata. As soon as you go toward a much complex data structure you need to write a curation platform to support these efforts. We did this with the Bioeditor (https://www.researchgate.net/profile/Pascale_Gaudet/publication/266157134_The_Bioeditor_strategies_for_efficient_and_accurate_protein_annotation/links/54b647c90cf28ebe92e7c412/The-Bioeditor-strategies-for-efficient-and-accurate-protein-annotation.pdf) for protein annotations. But even such a platform was really only useful for professional biocurators with a mindset targeted toward quality and conciseness.

To give you an example on how one need to double check everything: the table of MM cell lines that you mention is indeed well done and contains quite a number of useful data points (especially mutations) that I therefore wanted to add to the Cellosaurus. But each one had to be checked manually and unfortunatly many are wrong (for example a KRAS mutation is indicated as "S13C - Het" but there is no Ser in position 13, but a Gly. This error comes from one of the publications used to build this table and which is full of typos in their supplementary material). So overall I have already spent 4 hours checking some of the information and in the process adding some useful data and references, but it is a manual process and opening the gate to automatic update would swamp the resource with errors.

PS: in term of providing liks to external resources: this is already in place and there are already more than 20'000 "web links" in the Cellosaurus. And the link to the table you mention will be added to the relevant entries.