globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
19 stars 3 forks source link

Parsing issues with nomer tsv records? #5

Open cboettig opened 6 years ago

cboettig commented 6 years ago

Hi @jhpoelen ,

I'm running into some issues parsing the taxonCache file in the Zenodo-archived data http://doi.org/10.5281/zenodo.1213465, (which looks super nice otherwise btw).

For instance, the readr package in R shows a few parsing errors, mostly due to what might be extraneous quote characters:

taxonCache <- readr::read_tsv("https://zenodo.org/record/1213465/files/taxonCache.tsv.gz")
problems(taxonCache)

shows these errors

      row col         expected           actual     file                    
    <int> <chr>       <chr>              <chr>      <chr>                   
 1  98457 commonNames delimiter or quote A          'data/taxonCache.tsv.gz'
 2 119858 commonNames delimiter or quote m          'data/taxonCache.tsv.gz'
 3 119858 commonNames delimiter or quote " "        'data/taxonCache.tsv.gz'
 4 425504 path        delimiter or quote c          'data/taxonCache.tsv.gz'
 5 425504 path        delimiter or quote S          'data/taxonCache.tsv.gz'
 6 425504 path        delimiter or quote m          'data/taxonCache.tsv.gz'
 7 425504 path        delimiter or quote A          'data/taxonCache.tsv.gz'
 8 425504 path        delimiter or quote m          'data/taxonCache.tsv.gz'
 9 425504 path        delimiter or quote a          'data/taxonCache.tsv.gz'
10 425504 path        delimiter or quote A          'data/taxonCache.tsv.gz'
11 425504 path        delimiter or quote " "        'data/taxonCache.tsv.gz'
12 425504 NA          9 columns          10 columns 'data/taxonCache.tsv.gz'

Those are pretty minor though, looks like only 3 rows are having issues. More troublesome is that somehow readr parsing of the file is getting some rows miss-aligned, e.g. if you then do:

library(dplyr)
taxonCache %>% filter(grepl(":", path))

you get a whole sequence of rows where the path column has pathId values. A quick inspection of these rows shows they are all shifted over by one column, as they are all missing the first column (an id). (Same problem can be reproduced with the base R read.delim, which is much slower than readr implementation). Is there something that can be done to so those rows that don't have an id still begin with a proper delimiter such that they get an NA for id instead of causing this miss-alignment?

jhpoelen commented 6 years ago

@cboettig thanks for sharing. Your comments highlight various separate issues. I'll attempt to address each of them separately in the following comments. I am planning to release a new GloBI Taxon Graph version v0.3.2 with corrections applied in this thread.

First, in line 98457 in taxonCache.tsv v0.3.1, I found (please note that header was added for convenience)

id      name    rank    commonNames     path    pathIds pathNames       externalUrl     thumbnailUrl
EOL:224784      Neoniphon sammara       Species "Kolvin-soldaat @af | Deek @ar | 鐵甲 @cnm | Eichhörnchenfisch @de | Sammara squirrelfish @en | Candil samara @es | Corocoro @fj | Marignan tacheté @fr | \"Ala'ihi @hw | Ukeguchi-ittoudai @ja | 무늬얼게돔 @ko | Jerra @mh | Kolithaduva @ml | Kinolu @ms | Esquilo samara @pt | Malau-tui @sm | Baga-baga @tl | Araoe @ty | Cá Son dá dài @vi | 条纹长颏鳂 @zh | 莎姆新東洋金鱗魚 @zh-Hant |"        Animalia | Chordata | Actinopterygii | Beryciformes | Holocentridae | Neoniphon | Neoniphon sammara     EOL:1 | EOL:694 | EOL:1905 | EOL:8234 | EOL:8237 | EOL:24504 | EOL:224784       kingdom | phylum | class | order | family | genus | species     http://eol.org/pages/224784     http://media.eol.org/content/2009/05/19/16/85885_98_68.jpg

Note that commonNames value is (incorrectly) enclosed by double quotes and an escaped \"Ala'ihi @hw

On closer inspection, the commonNames value was enclosed by quotes when csv was still used to store taxonCache. This also explains the escaped double quote. Also, it appears that the Hawaiian name for Neoniphon sammara is not transcribed properly in EOL http://eol.org/pages/224784/names/common_names . Instead of "Ala'ihi, I suspect the name should be 'ala'ihi, replacing the double quotes with a single quote.

@jhammock any change you can update the common name? From sources like http://www.wpcouncil.org/managed-fishery-ecosystems/hawaii-archipelago/regulations-and-enforcement-hawaii/ it appears that the common name is used to describe various different species, not just Neoniphon sammara .

To correct for this, double quotes are removed and the escape double quote has been replaced with the original string reported by EOL, including the double quotes. Note that TSV does not need escaping of quotes (https://www.iana.org/assignments/media-types/text/tab-separated-values) .

jhpoelen commented 6 years ago

A second issue was reported on line 119858:

id  name    rank    commonNames path    pathIds pathNames   externalUrl thumbnailUrl
EOL:392765  Handroanthus chrysanthus    Species "roble amarillo @en | \"makulis\" @es | เหลืองอินเดีย @th |"    Plantae | Tracheophyta | Magnoliopsida | Lamiales | Bignoniaceae | Handroanthus | Handroanthus chrysanthus  EOL:281 | EOL:4077 | EOL:283 | EOL:4300 | EOL:4421 | EOL:27931337 | EOL:392765  kingdom | phylum | class | order | family | genus | species http://eol.org/pages/392765 http://media.eol.org/content/2015/02/26/03/48029_98_68.jpg

Similar pattern is observed here: csv-style escaping/quoting used because of the usage of double quotes in the text.

@jhammock any idea why makulis for spanish common name on http://eol.org/pages/392765/names/common_names is surrounded by double quotes?

To correct, doubles quotes are removed as well as the escaped double quotes.

jhpoelen commented 6 years ago

A third issue was reported on line 425504:

id  name    rank    commonNames path    pathIds pathNames   externalUrl thumbnailUrl
INAT_TAXON:379688   candidatus phytoplasma  genus       "Bacteria | Firmicutes | Mollicutes | \"candidatus phytoplasma\""   INAT_TAXON:67333 | INAT_TAXON:151853 | INAT_TAXON:151986 | INAT_TAXON:379688    kingdom | phylum | class | genus    http://inaturalist.org/taxa/379688  

Same double quoting issues here. integration tests confirm that iNaturalist explicitly reports "candidatus phytoplasma" for the genus. To correct, enclosing double quotes are removed as well as the escape characters.

jhpoelen commented 6 years ago

A fourth issue was found, where entries in taxonCache were found without a taxonId column. This was a transformation mistake and entries with missing taxonId columns will be removed. Note that the entries without an id actually had valid counter parts in the taxonCache file.

jhpoelen commented 6 years ago

Also, please note that the first three issues are definitely data errors, but not tsv parsing errors. TSV, according to IANA https://www.iana.org/assignments/media-types/text/tab-separated-values , does not have any string quoting . Please see https://github.com/tidyverse/readr/issues/844 .

If empty quote parameter is used, no problems are encountered when reading the taxonCache.tsv :

taxonCache <- readr::read_tsv('taxonCache.tsv', quote='')
Parsed with column specification:
cols(
  id = col_character(),
  name = col_character(),
  rank = col_character(),
  commonNames = col_character(),
  path = col_character(),
  pathIds = col_character(),
  pathNames = col_character(),
  externalUrl = col_character(),
  thumbnailUrl = col_character()
)
|=================================================================| 100%  904 MB
> library(readr)
> problems(taxonCache)
# tibble [0 × 4]
# ... with 4 variables: row <int>, col <int>, expected <chr>, actual <chr>

@cboettig curious to hear your thoughts on all this.

jhpoelen commented 6 years ago

I've prepared a pre-release of taxonCache with applied changes, please see https://depot.globalbioticinteractions.org/tmp/taxon-0.3.2/taxonCache.tsv.gz . Please let me know if this pre-release solves this issue. If not, or if you find new issue, please do share.

cboettig commented 6 years ago

Thanks, will do! Good point on the tsv by the way; makes total sense. The whole escaped quoting thing in csv files always bugged me, so tsv is a pretty clever solution I never properly appreciated (since it's harder to imagine needing a literal \t in a text file, but easy to see why you need a literal ,)

Quick clarification on the entities that didn't have valid ids and were thus creating the alignment problems: so those rows were duplicates of rows already elsewhere in taxonCache? Are those rows now dropped from the table?

I'm playing a bit with parsing the pipe strings right now; I see there utility but I think it would often be convenient to have a more explicit relationship between rank, value, and id in those strings. Will let you know if that surfaces any other parsing issues for me.

jhpoelen commented 6 years ago

Quick clarification on the entities that didn't have valid ids and were thus creating the alignment problems: so those rows were duplicates of rows already elsewhere in taxonCache? Are those rows now dropped from the table?

I did some spot checks, and duplicates seem to exist. I removed the entries with path values that include the unexpected : delimited values.

jhpoelen commented 6 years ago

I see there utility but I think it would often be convenient to have a more explicit relationship between rank, value, and id in those strings.

I agree that zipping (combining) path / pathIds / pathNames is not convenient. It seems that most biologist are comfortable with tabular formats, so I am trying to figure out ways to mold data into that shape to lower barrier to edit / use / share without losing too much flexibility. Am open to suggestions and am in favor of exposing the same knowledge in different formats rather than taking a one-size-fits-all approach.

cboettig commented 6 years ago

@jhpoelen I think I'm still seeing a whole bunch of entries with alignment issues?

library(tidyverse)
taxonCache <- read_tsv("https://depot.globalbioticinteractions.org/tmp/taxon-0.3.2/taxonCache.tsv.gz", quote="")

taxonCache %>% filter(!grepl("(:|-|_)", id)) 

shows a bunch of rows that are getting parsed that appear to have no id and so still have everything miss-aligned.

jhpoelen commented 6 years ago

@cboettig confirmed . I've uploaded a second pass at the taxonCach.tsv.gz file, overwriting https://depot.globalbioticinteractions.org/tmp/taxon-0.3.2/taxonCache.tsv.gz . Thanks for sharing, please check and let me know if you see more issues.

cboettig commented 6 years ago

@jhpoelen I seem to be getting a 403 access denied error at that URL now(?)

jhpoelen commented 6 years ago

Thanks for letting me know . I've updated the access privileges and the file should be public now. Please try again - https://depot.globalbioticinteractions.org/tmp/taxon-0.3.2/taxonCache.tsv.gz .

cboettig commented 6 years ago

@jhpoelen Thanks! Getting there! Looks like a possible data issue now:

e.g. row 243356 has a single entry in the path pipe-string but two entries in the pathNames pipe string.

taxonCache <- read_tsv("https://depot.globalbioticinteractions.org/tmp/taxon-0.3.2/taxonCache.tsv.gz", quote="")
taxonCache[243356,]$path
[1] "Gnaphalium purpureum"
> taxonCache[243356,]$pathNames
[1] "kingdom | species"

I see a total of 954 records where it looks to me that the number of pipes differs between path and pathName (though I guess some of these might be NA for one or the other, which is guess is okay, but some clearly aren't like the example above).

pattern <-  "\\s*\\|\\s*"
path_pipes <- taxonCache %>% purrr::transpose() %>% 
  map_int( ~length(str_split(.x$path, pattern)[[1]]))
pathName_pipes <- taxonCache %>% purrr::transpose() %>% 
  map_int( ~length(str_split(.x$pathNames, pattern)[[1]]))

which( !(path_pipes == pathName_pipes))
jhpoelen commented 6 years ago

Thanks against for your patience and feedback.

I went through the entries with mismatching path / path names. I found that most of the issue were due to an historic bug that didn't include empty ranks when ingesting path names. I removed the entries, after spot checking that duplicate entries existed in the taxonCache with aligned path/ids/names.

A single item, EOL:211953 Cetengraulis edentulus appear to have a \t embedded in common name Anchoveta raboamaril\t3. It appears that this common name was included in the taxoncache prior to the implementation of tab replacements on writing to tsv.

The remaining issues are terms related to non-taxa like environmental terms (e.g., wood) or functional groups (e.g., plankton). These do not have path/rank names. I've included the remaining issue below.

I've uploaded an updated copy of taxonCache for your review at https://depot.globalbioticinteractions.org/tmp/taxon-0.3.2/taxonCache.tsv.gz .

This cleanup of taxonCache.tsv makes me re-realize the importance of data mobility, archiving, versioning, automated quality control, peer review and the effort this all takes...

id name rank commonNames path pathIds pathNames externalUrl thumbnailUrl
ENVO:00000339 Stones NA NA environmental feature | mesoscopic physical object | abiotic mesoscopic physical object | piece of rock ENVO:00002297 | ENVO:00002004 | ENVO:01000010 | ENVO:00000339 NA http://purl.obolibrary.org/obo/ENVO_00000339 NA
ENVO:00001998 soil NA NA environmental material | soil ENVO:00010483 | ENVO:00001998 NA http://purl.obolibrary.org/obo/ENVO_00001998 NA
ENVO:00002003 bovine or equine dung NA NA environmental material | organic material | bodily fluid | excreta | feces ENVO:00010483 | ENVO:01000155 | ENVO:02000019 | ENVO:02000022 | ENVO:00002003 NA http://purl.obolibrary.org/obo/ENVO_00002003 NA
ENVO:00002007 Sediment NA NA environmental material | sediment ENVO:00010483 | ENVO:00002007 NA http://purl.obolibrary.org/obo/ENVO_00002007 NA
ENVO:00002040 Wood NA NA environmental material | organic material | wood ENVO:00010483 | ENVO:01000155 | ENVO:00002040 NA http://purl.obolibrary.org/obo/ENVO_00002040 NA
ENVO:01000155 Detritus NA NA environmental material | organic material ENVO:00010483 | ENVO:01000155 NA http://purl.obolibrary.org/obo/ENVO_01000155 NA
ENVO:01000404 plastic NA NA environmental material | anthropogenic environmental material ENVO:00010483 | ENVO:0010001 NA http://purl.obolibrary.org/obo/ENVO_01000404 NA
EOL:19662459 Zooplankton NA NA plankton | zooplankton NA NA http://eol.org/pages/19662459 NA
EOL:19662463 Phytoplankton NA NA plankton | phytoplankton NA NA http://eol.org/pages/19662463 NA
W:Bacterioplankton bacterioplankton NA NA plankton | bacterioplankton NA NA http://wikipedia.org/wiki/Bacterioplankton NA
W:Macroalgae Macroalgae NA NA algae | macroalgae NA NA http://wikipedia.org/wiki/Macroalgae NA
cboettig commented 6 years ago

@jhpoelen Found some more rows with alignment / missing-id issue:

look for cases with whitespace in the id:

taxonCache %>% filter(grepl("\\s", id))

(Missed this one before because previously my pattern looked for identifiers with "(:|-|_)", and some species names have these in them). I think it would actually be preferable if ids were all URIs -- would that be possible? e.g. there's what looks like some UUID strings in there but they don't have the urn:uuid: prefix, and some that seem to use _ as a prefix?

Another possible issue I noticed in pathNames:

taxonCache %>% filter(grepl(":", pathNames))

This gets the above miss-aligned ones too, but looks like it is mostly getting pathNames given by identifiers, maybe mostly from Wikidata. I see why wikidata does that so technically these aren't errors, but from a practical point of view it would be much better to have path names we can match to other path names. e.g. instead of WD:Q35409 | ... just have family | ... (as https://www.wikidata.org/wiki/Q35409). Or maybe that's an issue for a separate thread since it's not really about parsing problem?

jhpoelen commented 6 years ago

Thanks!

taxonCache %>% filter(grepl("\\s", id)) Nice! This remove 41 remaining entries with misaligned columns. The accompanying entries with ids were also present in the taxonCache.

I think it would actually be preferable if ids were all URIs -- would that be possible? That would be possible, and can already by done using a prefix mapping like: https://api.globalbioticinteractions.org/prefixes . You might have noticed that externalUrl expands the id to a resolvable id when possible.

e.g. there's what looks like some UUID strings in there but they don't have the urn:uuid: prefix, and some that seem to use _ as a prefix? Good point. Please note that #6 describes the origin of the prefix-less ids. I am hoping to incorporate these changes in the next major release of GloBI's taxon graph (should I rename to globi term graph instead?). I've started making manual patches using a development version and nomer, only to release that my time is probably better spent on thinking more about how to automatically validate, and report on, term mappings in addition to making the term graph more modular (e.g., splitting up term vertices and mapping edges into more manageable chunks similar to modular development of software libraries). If this is a big concern, please let me know.

taxonCache %>% filter(grepl(":", pathNames)) This additional validator only selected the wikidata path names. As you noticed, abbreviated wikidata identifiers were used to capture the rank information. This was done for pragmatic reasons. It should be relatively easy to map the rank name ids to associated labels. In the future, we might want to introduce a normalized term rank by introducing rankName and rankId, in addition to pathNames and pathNameIds. Related to #7 .

I've prepared https://depot.globalbioticinteractions.org/tmp/taxon-0.3.2/taxonCache.tsv.gz for your review. If you are ok with this version, I'll prepare another zenodo publication. Otherwise, please detail your concerns.

cboettig commented 6 years ago

Please note that #6 describes the origin of the prefix-less ids. I am hoping to incorporate these changes in the next major release of GloBI's taxon graph (should I rename to globi term graph instead?). I've started making manual patches using a development version and nomer, only to release that my time is probably better spent on thinking more about how to automatically validate, and report on, term mappings in addition to making the term graph more modular (e.g., splitting up term vertices and mapping edges into more manageable chunks similar to modular development of software libraries). If this is a big concern, please let me know.

Sounds like a plan. Nice to have ALA taxon addressed. I'm still seeing 57 rows that don't have a : in the id, e.g.

> taxonCache %>% filter(!grepl(":", id))
# A tibble: 57 x 9
   id                                   name   rank  commonNames path  pathIds pathNames externalUrl thumbnailUrl
   <chr>                                <chr>  <chr> <chr>       <chr> <chr>   <chr>     <chr>       <chr>       
 1 4701dc84-660a-4c51-bd16-593997f2370b Coelo… spec… NA          Fung… urn:ls… kingdom … NA          NA          
 2 ALA_Cladia_muelleri                  Cladi… unkn… NA          | Cl… | ALA_… | unknown NA          NA          
 3 ALA_Delia_hirticrura                 Delia… unkn… NA          | De… | ALA_… | unknown NA          NA          
 4 ALA_Oxycetonia_jucunda               Oxyce… unkn… NA          | Ox… | ALA_… | unknown NA          NA          
 5 NZOR-3-100527                        Proci… genus NA          | Pr… | NZOR… | genus   NA          NA          
 6 NZOR-3-109825                        Marie… genus NA          | Ma… | NZOR… | genus   NA          NA          
 7 NZOR-3-33834                         Misce… unkn… NA          | Mi… | NZOR… | unknown NA          NA          
 8 NZOR-3-40069                         Proka… unkn… NA          | Pr… | NZOR… | unknown NA          NA          
 9 NZOR-3-41136                         Urtic… genus NA          | Ur… | NZOR… | genus   NA          NA          
10 NZOR-3-54695                         Oreoc… genus NA          | Or… | NZOR… | genus   NA          NA          
# ... with 47 more rows

Maybe that is intentional? Isn't clear if these identifiers can be resolved, notably they have no externalUrl entry, though ALA and NZOR look like they want to be prefixes to something(?)

There's a larger set of things with no externalUrl, some which seem to have prefixes that aren't defined in the prefix table (CoL, CAAB, ...), e.g.:

> taxonCache %>% filter(is.na(externalUrl))
# A tibble: 2,770 x 9
   id       name    rank  commonNames path         pathIds                  pathNames    externalUrl thumbnailUrl
   <chr>    <chr>   <chr> <chr>       <chr>        <chr>                    <chr>        <chr>       <chr>       
 1 4701dc8… Coelom… spec… NA          Fungi | Chy… urn:lsid:indexfungorum.… kingdom | p… NA          NA          
 2 ALA_Cla… Cladia… unkn… NA          | Cladia mu… | ALA_Cladia_muelleri    | unknown    NA          NA          
 3 ALA_Del… Delia … unkn… NA          | Delia hir… | ALA_Delia_hirticrura   | unknown    NA          NA          
 4 ALA_Oxy… Oxycet… unkn… NA          | Oxycetoni… | ALA_Oxycetonia_jucunda | unknown    NA          NA          
 5 CAAB:0c… Halica… spec… NA          Halicarcinu… CAAB:0cd18290:475549ca:… species      NA          NA          
 6 CAAB:23… Taloch… spec… NA          | Talochlam… | CAAB:23270067          | species    NA          NA          
 7 CAAB:28… Crab z… unkn… NA          | Crab zoea  | CAAB:28850902          | unknown    NA          NA          
 8 CAAB:53… Mastog… spec… NA          Mastogloiac… CAAB:53210000 | CAAB:53… family | ge… NA          NA          
 9 CAAB:80… Microa… unkn… microalgae… | Microalgae | CAAB:80200000          | unknown    NA          NA          
10 CoL:254… Pseudo… spec… NA          Pseudoparre… CoL:25759155 | CoL:2549… genus | spe… NA          NA          
# ... with 2,760 more rows

Again, I think this all just shows what an amazing resource this is to have all of this compiled in a nice file like taxonCache.tsv.gz, as synthesizing all these resources in a single table like that is far from trivial!

Running a few experiments on the pipe paths but I think that all relates to next steps in #7 rather than possible issues in taxonCache. Lemme know what you think about the above concerns with some of the ids bot otherwise this is looking ready for release to me.

cboettig commented 6 years ago

Looks like there might be a few cases where path, pathNames, and pathIDs do not all have the same length (not counting cases where any one of these is na). e.g. row with id = ITIS:10824. Could be indicative of an issue?

cboettig commented 6 years ago

in case it's at all helpful, here's the crummy R code I'm using to identify the ~1000 rows that appear to have issues.

## Expect same number of pipes in each entry:
pattern = "\\s*\\|\\s*"
path_pipes <- taxonCache %>% purrr::transpose() %>% map_int( ~length(str_split(.x$path, pattern)[[1]]))
pathName_pipes <- taxonCache %>% purrr::transpose() %>% map_int( ~length(str_split(.x$pathNames, pattern)[[1]]))
pathIds_pipes <- taxonCache %>% purrr::transpose() %>% map_int( ~length(str_split(.x$pathIds, pattern)[[1]]))
na_path <- is.na(taxonCache$path)
na_pathNames <- is.na(taxonCache$pathNames)
na_pathIds  <- is.na(taxonCache$pathIds)

trouble <- which( !(pathIds_pipes == path_pipes) & !na_path & !na_pathIds)

## Here's the ~1000 rows that appear miss-matched to me
taxonCache[trouble,]
jhpoelen commented 6 years ago

Very helpful indeed, thank for being thorough I am working on an input / output validation framework to more easily detect these inconsistencies. #8 . Curious to hear your thoughts on that.

jhpoelen commented 6 years ago

@cboettig just published http://doi.org/10.5281/zenodo.1250572 . In this version, consistency terms and links were checked using nomer's validate-term and validate-term-link. Also, various fixes were included to help make the ids and their hierarchies a bit more well-behaved.

cboettig commented 6 years ago

@jhpoelen Maybe I'm not understanding something here, but it seems there's ~ 500,000 rows in taxonCache involving duplicate ids?

I think this should be reproducible R code:

library(tidyverse)
taxonCache <- read_tsv("https://zenodo.org/record/1250572/files/taxonCache.tsv.gz", quote="")

dup_id <- 
  taxonCache %>% select(id) %>% group_by(id) %>% 
  summarise(n_id = length(id)) %>% filter(n_id > 1) 

trouble <- taxonCache %>% semi_join(select(dup_id, id))

# a data frame with the subset of taxonCache having duplicate ids
trouble

This prevents me from establishing a unique path / pathId / pathNames for an ID; it's not clear how to resolve the conflicts. I think this is related (/the cause of) to the issue I just added to #7

jhpoelen commented 6 years ago

@cboettig thanks for sharing. See https://github.com/globalbioticinteractions/nomer/issues/7#issuecomment-395992615 . I think this warrants a further discussion. . .

jhpoelen commented 6 years ago

Also, please note https://github.com/globalbioticinteractions/nomer/issues/9 - would having the name source / retrieval date would provide more information on which taxon id to select?

Currently, GloBI itself uses a pretty blunt method - just use all that match to populate taxon search index/ graph.

jhpoelen commented 4 years ago

Here's an example of a taxon id with slight changes in name hierarchies as provided by the name source. Note that http://id.biodiversity.org.au/node/apni/50587232 and https://id.biodiversity.org.au/taxon/apni/51337710 are both outdated identifiers for Plantae. So, this is an example of multiple interpretations of taxon ids.

Am leaving this issue open because it exposes some interesting effects associated to taxon ids.

id name rank commonNames path pathIds pathNames externalUrl thumbnailUrl
ALATaxon:NZOR-6-102447 Eurya genus Plantae | Charophyta | Equisetopsida | Magnoliidae | Ericales | Pentaphylacaceae | Eurya ALATaxon:http://id.biodiversity.org.au/node/apni/50587232 | ALATaxon:http://id.biodiversity.org.au/node/apni/50587231 | ALATaxon:http://id.biodiversity.org.au/node/apni/50587230 | ALATaxon:http://id.biodiversity.org.au/node/apni/50587229 | ALATaxon:http://id.biodiversity.org.au/node/apni/8790835 | ALATaxon:http://id.biodiversity.org.au/node/apni/8305023 | ALATaxon:NZOR-6-102447 kingdom | phylum | class | subclass | order | family | genus https://bie.ala.org.au/species/NZOR-6-102447
ALATaxon:NZOR-6-102447 Eurya genus Plantae | Charophyta | Equisetopsida | Magnoliidae | Ericales | Pentaphylacaceae | Eurya ALATaxon:https://id.biodiversity.org.au/taxon/apni/51337710 | ALATaxon:https://id.biodiversity.org.au/taxon/apni/51337706 | ALATaxon:https://id.biodiversity.org.au/taxon/apni/51337705 | ALATaxon:https://id.biodiversity.org.au/taxon/apni/51337515 | ALATaxon:https://id.biodiversity.org.au/taxon/apni/51311074 | ALATaxon:https://id.biodiversity.org.au/node/apni/8305023 | ALATaxon:NZOR-6-102447 kingdom | phylum | class | subclass | order | family | genus https://bie.ala.org.au/species/NZOR-6-102447