globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
19 stars 3 forks source link

nomer data #4

Open cboettig opened 6 years ago

cboettig commented 6 years ago

@jhpoelen In the spirit of minimal / mobile tooling, I was wondering if it would be possible to expose all of the nomer data as a single rdf dump (or more developer friendly, as JSON-LD-formatted json-stream object?) Or maybe you already do something like this?

jhpoelen commented 6 years ago

Interesting! Could you please provide a more specific example?

cboettig commented 6 years ago

Good question, I'm still wrapping my own head around just what that would be. One possible configuration might be something like:


{"@context": {"@vocab": "http://example.com/", "parent":  {"@type": "@id"}},
    "@graph": [
    {"name": "Animalia",    "rank": "kingdom",  "@id": "GBIF:1"},
    {"name": "Chordata",    "rank": "phylum",   "@id": "GBIF:44",      "parent": "GBIF:1"},
    {"name": "Mammalia",    "rank": "class",    "@id": "GBIF:359",     "parent": "GBIF:44"},
    {"name": "Carnivora",   "rank": "order",    "@id": "GBIF:732",     "parent": "GBIF:359"},
    {"name": "Canidae",     "rank": "family",   "@id": "GBIF:9701",    "parent": "GBIF:732"},
    {"name": "Canis",       "rank": "genus",    "@id": "GBIF:5219142", "parent": "GBIF:9701" },
    {"name": "Canis lupus", "rank": "species",  "@id": "GBIF:5219173", "parent": "GBIF:5219142"}
    ]
  }

This can equivalently be represented as a set of triples, but is quite nice in JSON. JSON is easier to parse than pipe strings, and the JSON-LD algorithms can be nice for manipulating this, e.g. to converted it into a nested structure by parent or by child (e.g. by defining the @reverse property in the @context, see the above example in the JSON-LD Playground .

This can also obviously be rendered to the equivalent rdf. Clearly one would want a more intelligent choice for @context than the "@vocab" : "http://example.com", e.g. the rank terms should probably be defined in darwin core, etc; that's probably in the source data anyway.

I didn't play with this example yet mixing and matching entries from different authorities, but clearly that would be important. Obviously if you had all the different examples in the above format you could already query the JSON to say, give me all @ids that have "name": "Canis lupus" (and maybe also have "rank": "species"), so it would be a good start already, though it might be interesting to think about adding triples/statements explicitly defining some relation between them. (Doing something like GBIF:id owl:sameAs ITIS:id is probably asking for trouble(??), even though that's probably how most researchers want to think about these ids...

jhpoelen commented 6 years ago

Nice! I like the idea of bold statements like NCBI:9606 sameAs GBIF:2436436 because it invites discussion and help to codify assumptions that are often implicitly made already. Isn't a Homo sapiens a Homo sapiens ? I'd even argue that Homo sapiens sameAs Homo sapiens (often used) is less accurate than NCBI:9606 sameAs GBIF:2436436 , because the ids define (by reference) an explicit taxonomic context, whereas the strings leaves the machine (and most humans) guessing: is this just a sequence of characters or a taxonomic name?

Json-ld example looks pretty good, I'd go for including the sameAs things and making the rank a little more explicit like:

{"@context": {"@vocab": "http://example.com/", "parent":  {"@type": "@id"}},
    "@graph": [
    {"name": "kingdom", "@id": "GBIF_RANK:1" },
   { "name": "phylum", "@id", "GBIF_RANK:2", "parent": "GBIF_RANK_1"},
    {"name": "Animalia",    "rank": "GBIF_RANK:1",  "@id": "GBIF:1"},
    {"name": "Chordata",    "rank": "GBIF_RANK:2",   "@id": "GBIF:44",      "parent": "GBIF:1"},
      ...
    ]
  }

Note that with existing features, using tab separated lines with pipes for internal arrays, you can do searches like "give me all the Anura, but exclude the plants "(note that, in addition to being an amphibian order, Anura is also a plant genus):

cat [some names] | java -jar nomer.jar append | grep "Anura" | grep -v "Plantae"

Granted that it is not as semantically explicit, is it quite fast and be quite accurate after perhaps using some column selection using awk. This said, I do agree that json-ld / jq provide much more powerful graph-like queries on the command-line .

How would you imagine using this future json-ld feature? Do you imagine that the format would be both input and output? Are you aware of any other projects that use this kind of format to exchange taxonomic (equivalence) data?

cboettig commented 6 years ago

Nice, you make a convincing case that we should just go for it with sameAs. It certainly seems justified to assert that an NCBI identifier is the sameAs the GBIF identifier when the strings match and all, though I imagine there are some edge cases like where one authority has split a species and the other has not.

Yeah, I do like the simplicity of the pipe lists, (and looks like they have a precedent in Darwin core higherTaxonomy?) but they can be tricky to use when you want to keep track of which rank Anura is supposed to refer to, and maybe according to which authority. I tell my students that grep should be thought of as a last resort, when the data provider hasn't given you a more structured way of doing something, since it is easier for it to have unanticipated side-effects like getting plant genus instead of a vertebrate order! So in general, I think a well-thought-out JSON structure would be ideal.

Good question about the JSON-LD framing. My thinking was that the nice thing about JSON-LD frames is that it gives the user (or at least the app developer) some control over what their preferred JSON-LD structure might be -- in particular, I was thinking that it might be nicer for the user to have a nested JSON file than to have to resolve the parent links manually, and JSON-LD handles that rather nicely (even letting you reverse the nesting). However, with a little more thought, I'm not really sure that a nested structure is particularly useful. I think what most researchers would find most intuitive would be for rank names to act as keys rather than values in the JSON, something like:

{
      "species": {"name":  "Canis lupus", "@id": "GBIF:5219173", "sameAs": "ITIS:180596", ...},
      "genus":   {"name":  "Canis", "@id": "GBIF:5219142"},
      "family":  {"name":  "Canidae",  "@id": "GBIF:9701"},
      "order":   {"name":  "Carnivora", "@id": "GBIF:359"},
      "class":   {"name": "Mammalia","@id": "GBIF:732"},
      "phylum":  {"name": "Chordata",  "@id": "GBIF:44"},
      "subkingdom": {"name": "Bilateria", "@id": "ITIS:914154"}, 
      "kingdom": {"name": "Animalia",  "@id": "GBIF:1"}
    }

Do you think something like that is possible? It is clearly a little more dicey semantically -- it requires divorcing the rank levels (like phylum, subphylum etc) from a particular authority, and it also doesn't leave room for different authorities to have different name strings for the same rank (though I think that is implicit in using sameAs). Maybe those issues (and maybe others) make it unworkable, but I think it corresponds to how most ecologists want to think about and use taxonomic names.

e.g. at least for something like this wolf, ITIS might provide additional ranks, but agrees about the names of all the ranks that match GBIF ranks (i.e. both agree the "Class" is "Mammalia"). not sure if that would hold in general.

cboettig commented 6 years ago

Side comment: provided we don't butcher the semantics, using JSON-LD instead of plain JSON means the data automatically has a sensible RDF serialization as well, which might appeal to the hardcore biodiversity informatics folks. Or put another way, one could think of this as starting with a dump of RDF triple statements from all of the authorities, and we are just defining a JSON-LD frame that parses said triples into a more developer-friendly JSON structure...

jhpoelen commented 6 years ago

Did some experimentation today with your idea, and came up with something like the output below: one json object per line. Note that I left out all taxonomic ranks. These can be added once we settle on a json-ld format.

Regarding the semantics of ranks: rather than interpreting it as a property, I see is as a relationship: so, species: { X } would be interpreted as X is a taxon of rank species, where species can be linked to a (coded) relationship from some taxonomy ontology. "same_as" would be handled similarly.

@cboettig curious to hear your thoughts on this.

echo -e "ITIS:180596\tCanis lupus" | java -jar nomer/target/nomer-0.0.1-SNAPSHOT-jar-with-dependencies.jar appendJson globi-globalnames | jq .
using matcher [org.eol.globi.taxon.GlobalNamesService]
{
  "species": {
    "@id": "NCBI:9612",
    "name": "Canis lupus",
    "same_as": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "OTT:247341",
    "name": "Canis lupus",
    "same_as": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "INAT_TAXON:42048",
    "name": "Canis lupus",
    "same_as": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "ITIS:180596",
    "name": "Canis lupus",
    "same_as": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "IRMNG:11407661",
    "name": "Canis lupus",
    "same_as": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "GBIF:5219173",
    "name": "Canis lupus",
    "same_as": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
cboettig commented 6 years ago

@jhpoelen This is definitely interesting. Note that in your example, species here is still acting as a "property" (a predicate in RDF speak), but a predicate that takes a node / reference / object (we have too many terms for the same concept), instead of taking a literal.

But I like this! You're probably right that it's wise to explicitly have a JSON object for each identifier. Note that your example could be "compacted" in JSON-LD by

{ "@context": {
"@vocab": "https://nomer.org/",
"same_as": {"@type": "@id"}
},
"@graph": [
{
  "species": {
    "@id": "NCBI:9612",
    "name": "Canis lupus",
    "same_as":  "ITIS:180596"
  }
},
{
  "species": {
    "@id": "OTT:247341",
    "name": "Canis lupus",
    "same_as":  "ITIS:180596"
  }
},
{
  "species": {
    "@id": "INAT_TAXON:42048",
    "name": "Canis lupus",
    "same_as": "ITIS:180596"
  }
},
{
  "species": {
    "@id": "ITIS:180596",
    "name": "Canis lupus",
    "same_as":  "ITIS:180596"
  }
},
{
  "species": {
    "@id": "IRMNG:11407661",
    "name": "Canis lupus",
    "same_as": "ITIS:180596"
  }
},
{
  "species": {
    "@id": "GBIF:5219173",
    "name": "Canis lupus",
    "same_as":  "ITIS:180596"
  }
}]
}

i.e. see that in action here: http://tinyurl.com/y7rh648u

cmungall commented 6 years ago

Although the prefix is irrelevant at the RDF level, I would encourage NCBITaxon as the prefix since that's standard in OBO.

If you want to use the obolibrary purls for NCBITaxon, then the correct OWL construct is owl:equivalentClasses and not owl:sameAs, otherwise you induce punning (but conversely this induces the ITIS IRI to be an owl:Class, which may not be their intent...). Or you can just ignore OWL semantics and use sameAs, YMMV...

cboettig commented 6 years ago

@cmungall Thanks much for weighing in here. Yeah, I suspected owl:sameAs could have some unintended consequences -- what does 'induce punning' mean?

cmungall commented 6 years ago

See: https://www.w3.org/TR/owl2-new-features/#F12:_Punning

The arguments for sameAs must be individuals, the arguments for equivalentClasses must be classes. individuals and classes are disjoint in OWL-DL. However, if something is inferred to be both it doesn't create a problem as they are assumed to be different entities with the same name.

It has no effect on RDF-level interpretations, only on the OWL interpretation of the graph

jhpoelen commented 6 years ago

good stuff! Leaving out the plumbing and compaction:

echo -e "ITIS:180596\tCanis lupus" | java -jar nomer/target/nomer-0.0.1-SNAPSHOT-jar-with-dependencies.jar appendJson globi-globalnames | jq .

now produces:

{
  "species": {
    "@id": "NCBITaxon:9612",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "OTT:247341",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "INAT_TAXON:42048",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "ITIS:180596",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "IRMNG:11407661",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "GBIF:5219173",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}

This is assuming that non NCBITaxon: ids have some kind of class hierarchy. Perhaps a way to motivate others / ourselves to repeat http://obofoundry.org/ontology/ncbitaxon.html with other taxonomies. . . .

@cboettig @cmungall is this what you had in mind?

jhpoelen commented 6 years ago

Come to think of it, Nomer can now perhaps take on the role of term class hierarchy builder. Imagine ...

cboettig commented 6 years ago

👏 I like where this is going.

So how feasible would it be to create a JSON dump like this for every ID that nomer knows? Is it stupid to create that kind of static record? My intuition is that having such a JSON blob would be easier to develop other tooling against than introducing dependency on a particular software or web-api to do this stuff. Guessing the file would be large but probably manageable?

jhpoelen commented 6 years ago

Feasible with performance varying by the matcher. For instance, the globi-globalnames and globi-enrich matchers talk to web APIs, so you'd have to feed in the world to match with it and would take a while. However, matchers like globi-cache uses a taxon graph a la https://doi.org/10.5281/zenodo.755513 is used. And . . . these graphs can be expressed in json.

I don't think static records are stupid. In fact, I think dynamic records are stupid if they don't leave a trail of static records. And I can point to static records, archive them and give them to friends. So, for lack of better terms, I think that static records are pretty smart.

cboettig commented 6 years ago

@jhpoelen Very cool. Also somehow I hadn't seen https://doi.org/10.5281/zenodo.755513 before, that's very handy. yay for convenient static records.

jhpoelen commented 6 years ago

I've extended the suggested json output to include ranks.

Now, echo -e "ITIS:180596\tCanis lupus" | java -jar nomer/target/nomer-0.0.1-SNAPSHOT-jar-with-dependencies.jar append-json globi-globalnames | jq . produces the result included below.

Please note that path element is added to include all taxonomic ranks, even those that have no rank name or have ids, but no names. The addition of the path element makes the format a little lenient to (notoriously) usage of non-standard ranks, or ranks in latin (e.g., regnum vs kingdom).

{
  "species": {
    "@id": "NCBITaxon:9612",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  },
  "norank": {
    "@id": "NCBITaxon:131567",
    "name": ""
  },
  "superkingdom": {
    "@id": "NCBITaxon:2759",
    "name": "Eukaryota"
  },
  "kingdom": {
    "@id": "NCBITaxon:33208",
    "name": "Metazoa"
  },
  "phylum": {
    "@id": "NCBITaxon:7711",
    "name": "Chordata"
  },
  "subphylum": {
    "@id": "NCBITaxon:89593",
    "name": "Craniata"
  },
  "class": {
    "@id": "NCBITaxon:40674",
    "name": "Mammalia"
  },
  "superorder": {
    "@id": "NCBITaxon:314145",
    "name": "Laurasiatheria"
  },
  "order": {
    "@id": "NCBITaxon:33554",
    "name": "Carnivora"
  },
  "suborder": {
    "@id": "NCBITaxon:379584",
    "name": "Caniformia"
  },
  "family": {
    "@id": "NCBITaxon:9608",
    "name": "Canidae"
  },
  "genus": {
    "@id": "NCBITaxon:9611",
    "name": "Canis"
  },
  "path": {
    "names": [
      "",
      "Eukaryota",
      "Opisthokonta",
      "Metazoa",
      "Eumetazoa",
      "Bilateria",
      "Deuterostomia",
      "Chordata",
      "Craniata",
      "Vertebrata",
      "Gnathostomata",
      "Teleostomi",
      "Euteleostomi",
      "Sarcopterygii",
      "Dipnotetrapodomorpha",
      "Tetrapoda",
      "Amniota",
      "Mammalia",
      "Theria",
      "Eutheria",
      "Boreoeutheria",
      "Laurasiatheria",
      "Carnivora",
      "Caniformia",
      "Canidae",
      "Canis",
      "Canis lupus"
    ],
    "ids": [
      "NCBI:131567",
      "NCBI:2759",
      "NCBI:33154",
      "NCBI:33208",
      "NCBI:6072",
      "NCBI:33213",
      "NCBI:33511",
      "NCBI:7711",
      "NCBI:89593",
      "NCBI:7742",
      "NCBI:7776",
      "NCBI:117570",
      "NCBI:117571",
      "NCBI:8287",
      "NCBI:1338369",
      "NCBI:32523",
      "NCBI:32524",
      "NCBI:40674",
      "NCBI:32525",
      "NCBI:9347",
      "NCBI:1437010",
      "NCBI:314145",
      "NCBI:33554",
      "NCBI:379584",
      "NCBI:9608",
      "NCBI:9611",
      "NCBI:9612"
    ],
    "ranks": [
      "",
      "superkingdom",
      "",
      "kingdom",
      "",
      "",
      "",
      "phylum",
      "subphylum",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "",
      "class",
      "",
      "",
      "",
      "superorder",
      "order",
      "suborder",
      "family",
      "genus",
      "species"
    ]
  }
}
{
  "species": {
    "@id": "OTT:247341",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  },
  "no rank": {
    "@id": "OTT:805080",
    "name": ""
  },
  "domain": {
    "@id": "OTT:304358",
    "name": "Eukaryota"
  },
  "kingdom": {
    "@id": "OTT:691846",
    "name": "Metazoa"
  },
  "phylum": {
    "@id": "OTT:125642",
    "name": "Chordata"
  },
  "subphylum": {
    "@id": "OTT:947318",
    "name": "Craniata"
  },
  "superclass": {
    "@id": "OTT:278114",
    "name": "Gnathostomata"
  },
  "class": {
    "@id": "OTT:458402",
    "name": "Sarcopterygii"
  },
  "subclass": {
    "@id": "OTT:229558",
    "name": "Theria"
  },
  "superorder": {
    "@id": "OTT:392223",
    "name": "Laurasiatheria"
  },
  "order": {
    "@id": "OTT:44565",
    "name": "Carnivora"
  },
  "suborder": {
    "@id": "OTT:827263",
    "name": "Caniformia"
  },
  "family": {
    "@id": "OTT:770319",
    "name": "Canidae"
  },
  "genus": {
    "@id": "OTT:372706",
    "name": "Canis"
  },
  "path": {
    "names": [
      "",
      "",
      "Eukaryota",
      "Opisthokonta",
      "Holozoa",
      "Metazoa",
      "Eumetazoa",
      "Bilateria",
      "Deuterostomia",
      "Chordata",
      "Craniata",
      "Vertebrata",
      "Gnathostomata",
      "Teleostomi",
      "Euteleostomi",
      "Sarcopterygii",
      "Dipnotetrapodomorpha",
      "Tetrapoda",
      "Amniota",
      "Mammalia",
      "Theria",
      "Eutheria",
      "Boreoeutheria",
      "Laurasiatheria",
      "Carnivora",
      "Caniformia",
      "Canidae",
      "Canis",
      "Canis lupus"
    ],
    "ids": [
      "OTT:805080",
      "OTT:93302",
      "OTT:304358",
      "OTT:332573",
      "OTT:5246131",
      "OTT:691846",
      "OTT:641038",
      "OTT:117569",
      "OTT:147604",
      "OTT:125642",
      "OTT:947318",
      "OTT:801601",
      "OTT:278114",
      "OTT:114656",
      "OTT:114654",
      "OTT:458402",
      "OTT:4940726",
      "OTT:229562",
      "OTT:229560",
      "OTT:244265",
      "OTT:229558",
      "OTT:683263",
      "OTT:5334778",
      "OTT:392223",
      "OTT:44565",
      "OTT:827263",
      "OTT:770319",
      "OTT:372706",
      "OTT:247341"
    ],
    "ranks": [
      "no rank",
      "no rank",
      "domain",
      "no rank",
      "no rank",
      "kingdom",
      "no rank",
      "no rank",
      "no rank",
      "phylum",
      "subphylum",
      "subphylum",
      "superclass",
      "no rank",
      "no rank",
      "class",
      "no rank",
      "superclass",
      "no rank",
      "class",
      "subclass",
      "no rank",
      "no rank",
      "superorder",
      "order",
      "suborder",
      "family",
      "genus",
      "species"
    ]
  }
}
{
  "species": {
    "@id": "INAT_TAXON:42048",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  }
}
{
  "species": {
    "@id": "ITIS:180596",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  },
  "kingdom": {
    "@id": "ITIS:202423",
    "name": "Animalia"
  },
  "subkingdom": {
    "@id": "ITIS:914154",
    "name": "Bilateria"
  },
  "infrakingdom": {
    "@id": "ITIS:914156",
    "name": "Deuterostomia"
  },
  "phylum": {
    "@id": "ITIS:158852",
    "name": "Chordata"
  },
  "subphylum": {
    "@id": "ITIS:331030",
    "name": "Vertebrata"
  },
  "infraphylum": {
    "@id": "ITIS:914179",
    "name": "Gnathostomata"
  },
  "superclass": {
    "@id": "ITIS:914181",
    "name": "Tetrapoda"
  },
  "class": {
    "@id": "ITIS:179913",
    "name": "Mammalia"
  },
  "subclass": {
    "@id": "ITIS:179916",
    "name": "Theria"
  },
  "infraclass": {
    "@id": "ITIS:179925",
    "name": "Eutheria"
  },
  "order": {
    "@id": "ITIS:180539",
    "name": "Carnivora"
  },
  "suborder": {
    "@id": "ITIS:552303",
    "name": "Caniformia"
  },
  "family": {
    "@id": "ITIS:180594",
    "name": "Canidae"
  },
  "genus": {
    "@id": "ITIS:180595",
    "name": "Canis"
  },
  "path": {
    "names": [
      "Animalia",
      "Bilateria",
      "Deuterostomia",
      "Chordata",
      "Vertebrata",
      "Gnathostomata",
      "Tetrapoda",
      "Mammalia",
      "Theria",
      "Eutheria",
      "Carnivora",
      "Caniformia",
      "Canidae",
      "Canis",
      "Canis lupus"
    ],
    "ids": [
      "ITIS:202423",
      "ITIS:914154",
      "ITIS:914156",
      "ITIS:158852",
      "ITIS:331030",
      "ITIS:914179",
      "ITIS:914181",
      "ITIS:179913",
      "ITIS:179916",
      "ITIS:179925",
      "ITIS:180539",
      "ITIS:552303",
      "ITIS:180594",
      "ITIS:180595",
      "ITIS:180596"
    ],
    "ranks": [
      "Kingdom",
      "Subkingdom",
      "Infrakingdom",
      "Phylum",
      "Subphylum",
      "Infraphylum",
      "Superclass",
      "Class",
      "Subclass",
      "Infraclass",
      "Order",
      "Suborder",
      "Family",
      "Genus",
      "Species"
    ]
  }
}
{
  "species": {
    "@id": "IRMNG:11407661",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  },
  "kingdom": {
    "@id": "IRMNG:11",
    "name": "Animalia"
  },
  "phylum": {
    "@id": "IRMNG:148",
    "name": "Chordata"
  },
  "class": {
    "@id": "IRMNG:1310",
    "name": "Mammalia"
  },
  "order": {
    "@id": "IRMNG:12116",
    "name": "Carnivora"
  },
  "family": {
    "@id": "IRMNG:104585",
    "name": "Canidae"
  },
  "genus": {
    "@id": "IRMNG:1282727",
    "name": "Canis"
  },
  "path": {
    "names": [
      "Animalia",
      "Chordata",
      "Mammalia",
      "Carnivora",
      "Canidae",
      "Canis",
      "Canis lupus"
    ],
    "ids": [
      "IRMNG:11",
      "IRMNG:148",
      "IRMNG:1310",
      "IRMNG:12116",
      "IRMNG:104585",
      "IRMNG:1282727",
      "IRMNG:11407661"
    ],
    "ranks": [
      "kingdom",
      "phylum",
      "class",
      "order",
      "family",
      "genus",
      "species"
    ]
  }
}
{
  "species": {
    "@id": "GBIF:5219173",
    "name": "Canis lupus",
    "equivalent_to": {
      "@id": "ITIS:180596",
      "name": "Canis lupus"
    }
  },
  "kingdom": {
    "@id": "GBIF:1",
    "name": "Animalia"
  },
  "phylum": {
    "@id": "GBIF:44",
    "name": "Chordata"
  },
  "class": {
    "@id": "GBIF:359",
    "name": "Mammalia"
  },
  "order": {
    "@id": "GBIF:732",
    "name": "Carnivora"
  },
  "family": {
    "@id": "GBIF:9701",
    "name": "Canidae"
  },
  "genus": {
    "@id": "GBIF:5219142",
    "name": "Canis"
  },
  "path": {
    "names": [
      "Animalia",
      "Chordata",
      "Mammalia",
      "Carnivora",
      "Canidae",
      "Canis",
      "Canis lupus"
    ],
    "ids": [
      "GBIF:1",
      "GBIF:44",
      "GBIF:359",
      "GBIF:732",
      "GBIF:9701",
      "GBIF:5219142",
      "GBIF:5219173"
    ],
    "ranks": [
      "kingdom",
      "phylum",
      "class",
      "order",
      "family",
      "genus",
      "species"
    ]
  }
}
jhpoelen commented 6 years ago

@cboettig I'd like to better understand the future use of this feature. Ideally, this would help to get people to use Nomer, so that feedback loops can be established.

cboettig commented 6 years ago

@jhpoelen Fair question; I'm still experimenting here myself so I don't know the answer entirely. My thoughts / premise so far:

  1. I agree most researchers in my field will find tabular formats most intuitive; hence was really excited to learn about the existing tabular records you already have on Zenodo.
  2. As you know, there's a natural nesting to this data that makes tabular formats potentially cumbersome (e.g. pipe string issues). I agree with you that providing a range of tabular formats is probably the way to go, but I'm also curious about the potential to use JSON-LD instead if it can provide a more 'natural' solution than messing with tabular formats.
  3. I'm curious about the semantic tools, and more broadly, about forging better links between the practical needs of researchers and the rather interesting work of the informatics community. JSON-LD seems like a way to have-cake-and-eat-it-too, in that I think it can have some of the user-friendly-ness of plain tsv files while also remaining an explicitly valid link-data format that's directly posed for applying semantic tooling like sparql, and also web-friendly

So those are pretty vague thinking at this point, but this data seems to be about the right complexity (simple but not trivial) to dive into this exploration.

Did that make any sense?