biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
115 stars 51 forks source link

Add prefix [ASFIS] #872

Closed ddooley closed 1 year ago

ddooley commented 1 year ago

Prefix

asfis

Name

ASFIS List of Species for Fishery Statistics Purposes

Homepage

https://www.fao.org/fishery/en/collection/asfis/en

Repository

bioregistry

Description

To quote from the Food and Agriculture Organization of the United Nations ASFIS page, "The FAO Statistics Team (NFISS) of Fisheries and Aquaculture Division collates world capture and aquaculture production statistics at either the species, genus, family or higher taxonomic levels in 3 169 statistical categories (2022 data release) referred to as species items."

"ASFIS list of species includes 13 417 species items selected according to their interest or relation to fisheries and aquaculture. For each species item stored in a record, codes (ISSCAAP group, taxonomic and 3-alpha) and taxonomic information (scientific name, author(s), family, and higher taxonomic classification) are provided. An English name is available for most of the records, and about one third of them have also a French and Spanish name. Information is also provided about the availability of fishery production statistics on the species item in the FAO databases."

License

No response

Example Local Unique Identifier

fbm

Regular Expression Pattern for Local Unique Identifier

^[a-z]+$

URI Format String

https://www.fao.org/fishery/en/aqspecies/$1

Wikidata Property

No response

Contributor Name

Damion Dooley

Contributor GitHub

ddooley

Contributor ORCiD

0000-0002-8844-9165

Contact Name

Fishery Statistician

Contact ORCiD

No response

Contact GitHub

No response

Contact Email

faoterm@fao.org

Additional Comments

This resource is offered freely by FAO for download but no obvious licensing is indicated, and no caution about appropriate reuse.

cthoyt commented 1 year ago

@ddooley its bioregistry policy only to include a single person point of contact and explicitly not an opaque email. Do you know any individual who could be the contact person?

Also, I see you wrote bioregistry as the source code repository. Was there some confusion about what was supposed to go in this field? It’s supposed to be the source code repository if the prefix’s resource is version controlled

cthoyt commented 1 year ago

@ddooley two other concerns:

  1. it's not obvious what ASFIS means
  2. There appear to be 3 kinds of identifiers assigned to each item (stated on https://www.fao.org/fishery/en/collection/asfis/en) so we might want to consider using subspaces, or reconsider which is the best. Unfortunately, this website isn't so easy to navigate, so the 3-letter code might be the best we can do
    1. ISSCAAP code
    2. taxonomic code
    3. 3-alpha code
ddooley commented 1 year ago

I think "bioregistry" accidentally got auto-filled into that field somehow. I'd clear that out; they don't have a source code repository.

I will try to find a better contact email/person.

  1. ASFIS isn't quite an acronym, so in a sense it doesn't mean anything letterwise. But it is what they are calling their database. We could qualify it as faoasfis ?

  2. Indeed there are three identifiers, thanks for looking that up. But the database URIs for each entry use the unique 3 letter code, so I figured that was the best for URI resolution.

cthoyt commented 1 year ago

But even if it's not an acronym, it must come from something? maybe like picking random letters out of a phrase? This isn't a blocker but it would be great to give context to people who might read this entry in the future.

If the FAO organization has other meaningful prefixes, then it might be a good idea to use the subspace delimiter fao.asfis. Is that the case?

cthoyt commented 1 year ago

after some googling: ASFIS (Aquatic Sciences and Fisheries Information System)

cthoyt commented 1 year ago

Thanks @ddooley - if you get some more information, feel free to post it here or send a new request

ddooley commented 1 year ago

I think we should do your recommendation to call it fao.asfis

So I received this back from FAO Statistics people:

"ASFIS is a standard code-list for fishery statistics, it is not designed to be a complete species/taxonomic inventory – the main goal is to facilitate exchange of fisheries data "The e-mail contact for ASFIS is Fish-Statistics-Inquiries@fao.org "The URL for ASFIS is https://www.fao.org/fishery/en/collection/asfis whilst https://www.fao.org/fishery/en/aqspecies/fbm points to a series of species factsheets. These have been created for less than 1000 of the 13 000 ASFIS species. We suggest not to use the factsheets as general resource for ASFIS.

I didn't realize that the URL I gave for bioportal with the 3 letter code (https://www.fao.org/fishery/en/aqspecies/fbm) was only a small subset of the whole list. It looks like the whole list search result url that uses the taxonomic code (eg. https://www.fao.org/fishery/en/species/20560) is the best one to use.

So can you adjust:

Example Local Unique Identifier: 20560 Regular Expression Pattern for Local Unique Identifier: ^[0-9]+$ URI Format String: https://www.fao.org/fishery/en/species/$1 Contact Email : Fish-Statistics-Inquiries@fao.org

I will convert FoodOn's 3-alpha code urls to the taxonomic code ones.

cthoyt commented 1 year ago

Thanks for following up on this, I think the numeric identifiers will be much more valuable (and are overall better in line with good identifiers policies). I will make the according updates.

Do you know how to get the entire resource in a structured form? If so, I can also add it to PyOBO.

I still think there's a misunderstanding about the contact policy for bioregistry. What I am looking for is the name of an individual person who works on this resource, and I want their name and personal email. No group emails. This is the same as the locus of responsibility policy of the OBO Foundry (principle 11)

ddooley commented 1 year ago

Ok great on changes.

About contact person, that's an issue. FAO seems to have service desk emails rather than individuals I think its to keep service continuity regardless of staff changes. I know OBO Foundry probably wanted individual responsibility for better liaison, but many organizations/corporations wisely just offer service (e.g. database resource) oriented contact info to avoid big hassle of synching individual information out in the world as time goes by.

ddooley commented 1 year ago

About entire resource, the latest download is: https://www.fao.org/fishery/static/ASFIS/ASFIS_sp.zip in general https://www.fao.org/fishery/static/ASFIS/ folder. It provides an excel and a csv file containing fields "ISSCAAP","TAXOCODE","3A_CODE","Scientific_Name","English_name","French_name","Spanish_name","Arabic_name","Chinese_name","Russian_name","Author","Family","Order","Stats_data".

ddooley commented 1 year ago

Side question related to using bioregistry prefixes. Does it matter in OWL/protege whether an fao.asfis purl points to https://... or http://... ?

Also, In protege OWL functional syntax should I be saying AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> <http://purl.obolibrary.org/obo/FOODON_03411030> <fao.asfis:123456>) Or should I be saying AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> <http://purl.obolibrary.org/obo/FOODON_03411030> "fao.asfis:123456"^^xsd:anyURI)

cthoyt commented 1 year ago

I'm not sure if either of these are correct. In the first example if you put chevrons <>, I think this means that what's inside should be an IRI. If you have defined a prefix earlier in the document, then IRIs can be written shorthand without chevrons.

In the second example, if you encode a CURIE as a string, I think it's the OBO community standard to write it as a normal string, I'm not sure if people typically tag it with xsd:anyURI. @matentzn might be able to quickly clarify. In this scenario, since the CURIE is encoded as a string, you don't need an ontology-level annotation of the prefix and its expansion. I don't personally like that it's done this way, but it seems like an old choice that people are sticking to (as an aside, the Bioregistry was in part motivated by the reconciliation effort of all of these xrefs).

To directly answer your question, I don't think OWL/Protege have any preferences about the IRIs in general. I think you can use even more exotic protocols like ftp:// or even non-URL-looking IRIs like URNs. OBO PURLs are supposed to be http because there is a difference in RDF land based on the protocol, but here there's some wiggle room to make whatever choice you want (aligning with what's in bioregistry is always a good idea ;))

ddooley commented 1 year ago

Ok, thanks, that clarifies things and I now have FoodOn working with a bunch of new bioportal mediated prefixes. (Some DBXrefs remain to be standardized but will be).

matentzn commented 1 year ago
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> <http://purl.obolibrary.org/obo/FOODON_03411030> "fao.asfis:123456"^^xsd:anyURI)

By conventions we encode CURIE strings using xsd:string to make it more easily queryable and more uniform, e.g.

AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> <http://purl.obolibrary.org/obo/FOODON_03411030> "fao.asfis:123456"^^xsd:string)

I think we basically moved away from using xsd:anyUri at all much, as it is usually preferable to use syntax instead (see related discussion here)

ddooley commented 1 year ago

Well, my use here is many hasDBXref's that are pointing to resolvable namespace entities, so I was doing a combo in owl file, and prefix could be used in queries:

Prefix(asfis:=https://www.fao.org/fishery/en/aqspecies/) ... AnnotationAssertion(http://www.geneontology.org/formats/oboInOwl#hasDbXref http://purl.obolibrary.org/obo/FOODON_03411030 asfis:oif)

The issue I think I ran into was not being able to use fao.asfis:oif in above, i.e. dot not allowed in prefix when parsing Protege OWL file. But otherwise this seems fine and simple for querying? (p.s. I'll be converting 3-alpha codes to numeric soon)

matentzn commented 1 year ago

You can read about this on one of the all time hottest OBO discussions: https://github.com/information-artifact-ontology/ontology-metadata/issues/59

While there is no resolution, the basic sentiment is this:

That said I also want to note one important misconception: At RDF level, what you think are CURIEs are really not. displaying these prefixes for URIs has really only one purpose: to make the triples more humanly readable (and the file more compact). Other than that RDF entities fundamentally are URIs. Never curies. The next person loading your triples may or may not include your prefix declarations, and still deal with the identical ontology file.

But, as you can see by the nice debate above, there is no consensus!