ModellingWebLab / project_issues

An issues-only repository for issues that cut across multiple repositories
1 stars 0 forks source link

Epic 7: metadata annotation tools #63

Open jonc125 opened 5 years ago

jonc125 commented 5 years ago
jonc125 commented 5 years ago

Email train on annotating datasets:

From: Matthias König konigmatt@googlemail.com Sent: 01 August 2019 14:52 To: Michael Clerx michael.clerx@cs.ox.ac.uk Cc: Keating, Sarah s.keating@ucl.ac.uk Subject: Re: Metadata for CSV

Hi Michael,

yes, my plan was to try the frictionlessdata via the respective python libraries and see what is working and what is not. I think the most important step is to create some example use cases to see where problems are.

I definitely will use csv/tsv + json meta information (this solves all my possible use cases and is probably also the easeast thing to prototype). As soon as I have some examples/experiences I will share them with you.

Best Matthias

On Thu, Aug 1, 2019 at 2:11 PM Michael Clerx michael.clerx@cs.ox.ac.uk wrote: Hi again, This list is very convincing: https://frictionlessdata.io/software/ Do you plan to experiment with putting your data into this format? Wondering if the best way forward would be to just try it out over the next few months and see if we find anything lacking? Best wishes, Michael

On 24/07/2019 10:41, Matthias König wrote: Hi Sarah and Michael, this sounds great.

Basically what I want is a way to annotate my columns in a CSV/TSV file The CSV will have a single header row which defines the ids of the columns, e.g. study_id | sex | age | height | time | caffeine | ...

I want to have a simple way to add meta-information to these columns which consists of

My prefered solution is a combination of

I need something which I can track in git and is somehow human-readable/editable and is supported by wide range of tools/libraries. JSON seems to be a good solution here.

I found the following very interesting https://frictionlessdata.io/data-packages/ https://frictionlessdata.io/specs/table-schema/ which looks very closely to what I want with libraries available for python, R and javascript (which would cover most of my use-cases).

Similar approaches are things like JSON-LD, breaking things down to CSV + JSON (for rich description). It would be great if we could find a common solution here.

Best Matthias

On Wed, Jul 24, 2019 at 10:30 AM Keating, Sarah s.keating@ucl.ac.uk wrote: Hi Matthias

At COMBINE you seemed keen to add metadata to CSV files. Since WebLab/Michael Clerx (cc) are also wanting to do this, it would be great to coordinate any efforts so that we are at least consistent.

It doesn't look like COMBINE is going to land on a standard form for data any time soon so it would good to establish some consistency with annotation at least for CSV - which is commonly used 😊. That way we can then propose it to the COMBINE annotation list as a 'standard' way of annotating CSV.

Michael can add more technical stuff that he has looked into; I'm really the messenger at this point but will happily get involved.

Sarah

-- Matthias König, PhD. Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology https://livermetabolism.com konigmatt@googlemail.com https://twitter.com/konigmatt https://github.com/matthiaskoenig Tel: +49 30 2093 98435

-- Matthias König, PhD. Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology https://livermetabolism.com konigmatt@googlemail.com https://twitter.com/konigmatt https://github.com/matthiaskoenig Tel: +49 30 2093 98435

MichaelClerx commented 5 years ago

Eventually, all annotations should be based on community-agreed-upon ontologies, but the best strategy to achieve this might be:

  1. Make it up as we go along
  2. See if the system gets any uptake, leading to people with some stake in having a good ontology
  3. Discuss
  4. Map our ontology terms to existing ones

See also #21