GlobalNamesArchitecture / gnresolver

MIT License
0 stars 0 forks source link

As a user of gnresolver I need resolution information existed in json output for gni and resolver #69

Closed dimus closed 7 years ago

dimus commented 7 years ago

We need following information in output by importing data from database.

data_source_id: 4,
data_source_title: "NCBI",
gni_uuid: "2cf19440-46c2-52c5-9fce-d66194286102",
name_string: "Pomatomus saltator",
canonical_form: "Pomatomus saltator",
classification_path: "|Eukaryota|Opisthokonta|Metazoa|Eumetazoa|Bilateria|Deuterostomia|Chordata|Craniata|Vertebrata|Gnathostomata|Teleostomi|Euteleostomi|Actinopterygii|Actinopteri|Neopterygii|Teleostei|Osteoglossocephalai|Clupeocephala|Euteleosteomorpha|Neoteleostei|Eurypterygia|Ctenosquamata|Acanthomorphata|Euacanthomorphacea|Percomorphaceae|Pelagiaria|Scombriformes|Pomatomidae|Pomatomus|Pomatomus saltator",
classification_path_ranks: "|superkingdom||kingdom||||phylum|subphylum|||||superclass|class|subclass|infraclass|||||||||||order|family|genus|species",
classification_path_ids: "131567|2759|33154|33208|6072|33213|33511|7711|89593|7742|7776|117570|117571|7898|186623|41665|32443|1489341|186625|1489388|123365|123366|123367|123368|123369|1489872|1489885|1489894|30864|75033|94948",
taxon_id: "94948",
edit_distance: 0,
match_type: 1,

We do need this information, but it will come from new scoring system, so it is a separate ticket.

prescore: "3|0|0",
score: 0.988
dimus commented 7 years ago

It is important to preserve taxon_id as a part of the key in name_strings, because some datasources have the same name strings with different taxon_ids and we are loosing this information without it. In old database we have correct composite key -- uniqueness is created from 3 fields here -- name string id, data source id and taxon id. Taxon id and local id might be the same, but might be different, so we need to preserve them both.

+---------------------------+----------------------------------------------------+------+-----+---------+-------+
| Field                     | Type                                               | Null | Key | Default | Extra |
+---------------------------+----------------------------------------------------+------+-----+---------+-------+
| data_source_id            | int(11)                                            | NO   | PRI | NULL    |       |
| name_string_id            | int(11) unsigned                                   | NO   | PRI | NULL    |       |
| taxon_id                  | varchar(255)                                       | NO   | PRI |         |       |
| global_id                 | varchar(255)                                       | YES  |     | NULL    |       |
| url                       | varchar(255)                                       | YES  |     | NULL    |       |
| rank                      | varchar(255)                                       | YES  |     | NULL    |       |
| accepted_taxon_id         | varchar(255)                                       | YES  |     | NULL    |       |
| synonym                   | set('synonym','lexical','homotypic','heterotypic') | YES  | MUL | NULL    |       |
| classification_path       | text                                               | YES  |     | NULL    |       |
| classification_path_ids   | text                                               | YES  |     | NULL    |       |
| created_at (DROP)         | datetime                                           | YES  |     | NULL    |       |
| updated_at (DROP)         | datetime                                           | YES  |     | NULL    |       |
| nomenclatural_code_id     | int(11)                                            | YES  |     | NULL    |       |
| local_id                  | varchar(255)                                       | YES  |     | NULL    |       |
| classification_path_ranks | text                                               | YES  |     | NULL    |       |
+---------------------------+----------------------------------------------------+------+-----+---------+-------+