Closed seltmann closed 2 years ago
@seltmann thanks for the suggestion for adding the header.
Can you please provide preferred columns headers illustrated in an example?
I dont know the meaning of all the column contents so I am unable to assign preferred column headers correctly. Here is a pass at trying to do so but I do not know that the meaning of the column headers actually matches the data.
Here is a suggestion to get us started, but we should discuss when we meet Friday
to reflect the left/right (or provided/resolved names), suggest to prefix dwc: terms with provided/resolved.
For instance:
provided:dwc:taxonID | provided:dwc:ScientificName | relation:dwc:taxonomicStatus | resolved:dwc:taxonID | resolved:dwc:taxonName | resolved:dwc:taxonRank | resolved: resolved:dwc:HigherTaxon | resolved:HigherTaxonIDs | relation:dwc:nameAccordingToID | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum | Acamptopoeum argentinum | HAS_ACCEPTED_NAME | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum | Acamptopoeum argentinum | species | Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Acamptopoeum argentinum | https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum | kingdom | phylum | class | order | family | species | https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum |
so, a single line documents a citable name relation
(provided) -> (relation) -> (resolved)
where the properties of the provided/relation/resolved are captured in separate columns. The relation would include a citation like (name relationship according to ITIS, DiscoverLife . . . )
In upcoming nomer version, you can include an header based on the input/output schemas using commands like:
$ nomer list discoverlife --include-header | head -n4
Note that I have not yet introduced DwC terms associated with the column names due to ambiguity of mappings across existing mappers. Nomer supports non-taxonomic mappers also.
I imagine that translating the specific dumps into some DwC-like taxonomic scheme would be possible with the provided descriptive column names.
@seltmann first pass at header functionality available in just released Nomer v0.2.9 . Please review.
@jhpoelen
Header names look accurate based on above discussion. Some tabs are missing between formatted data between providedName and relationName For example:
nomer list discoverlife --include-header | head -n4 includes resolvedCommonNames, resolvedPath, resolvedPathIds, resolvedPathNames, resolvedExternalUrl, resolvedThumbnailUrl
nomer list --properties my.properties discoverlife does not include resolvedCommonNames, resolvedPath, resolvedPathIds, resolvedPathNames, resolvedExternalUrl, resolvedThumbnailUrl
curious if this is on purpose?
@seltmann thanks for taking the time to review.
For some reason, I was unable to reproduce your results.
Here's what I did:
# install 0.2.9
$ nomer version
0.2.9
# clean cache, just in case some old cached taxon files remained
$ nomer clean
...
$ nomer list discoverlife --include-header | head -n4 > withHeader.tsv
...
With attached withHeader.tsv (I added a txt extension for github to accept it)
opening the file in LibreOffice calc, gets me the expected results (see attached screenshot).
How did you capture the output of the nomer command?
Also re:
nomer list --properties my.properties discoverlife
can you please share your my.properties file?
nomer version 0.2.9
clean cache I did not clean cache last time, so did this time
nomer list discoverlife --include-header | head -n4 > withHeader.tsv
I imported into Google Sheets from .tsv file and this looks fine.
cat my.properties nomer.append.schema.output.example.taxon.rank.order=[{"column":0,"type":"path.order.id"},{"column": 1,"type":"path.order.name"},{"column": 2,"type":"path.order"}] nomer.append.schema.output=[{"column":0,"type":"externalId"},{"column": 1,"type":"name"},{"column": 2,"type":"authorship"},{"column":3,"type":"rank"}] nomer.schema.input=[{"column":0,"type":"externalId"},{"column": 1,"type":"name"},{"column": 2,"type":"authorship"},{"column": 3, "type":"rank"}]
@seltmann thanks!
with your my.properties, I was able to do:
$ $ nomer list --properties my.properties --include-header discoverlife | head -n4
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [discoverlife-taxon]
providedExternalId providedName providedAuthorship providedRank relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum (Friese, 1906) species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum(Friese, 1906) species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui Compagnucci, 2004 species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui Compagnucci, 2004 species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense Shinn, 1965 species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense Shinn, 1965 species
and
$ nomer list --properties my.properties --include-header discoverlife | head -n4 > withCustomHeader.tsv
with attached result (with appended .txt for github)
Weird. Can you reproduce?
@seltmann weird as in: I cannot reproduce your missing header, and I am noticing the expected custom headers show up.
btw - I noticed how my tab characters disappeared on copy-pasting from the terminal, but not in the redirected file output. Perhaps this explains the missing tabs from before.
All is fine regarding the tabs now.
Still seeing difference between the headers, but I think this is by design?
nomer list discoverlife --include-header | head -n4 includes resolvedCommonNames, resolvedPath, resolvedPathIds, resolvedPathNames, resolvedExternalUrl, resolvedThumbnailUrl
nomer list --properties my.properties discoverlife does not include resolvedCommonNames, resolvedPath, resolvedPathIds, resolvedPathNames, resolvedExternalUrl, resolvedThumbnailUrl
After running:
$ nomer list --properties my.properties --include-header discoverlife | head -n4
$ head withCustomHeader.tsv
providedExternalId providedName providedAuthorship providedRank relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum (Friese, 1906) species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+argentinum Acamptopoeum argentinum(Friese, 1906) species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui Compagnucci, 2004 species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+calchaqui Acamptopoeum calchaqui Compagnucci, 2004 species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense Shinn, 1965 species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombiense Shinn, 1965 species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiensis Acamptopoeum colombiensis Shinn, 1965 species SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Acamptopoeum+colombiense Acamptopoeum colombienseShinn, 1965 species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi Acamptopoeum fernandezi Gonzalez, 2004 species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+fernandezi Acamptopoeum fernandeziGonzalez, 2004 species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum Acamptopoeum inauratum (Cockerell, 1926) species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+inauratum Acamptopoeum inauratum (Cockerell, 1926) species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster Acamptopoeum melanogaster Compagnucci, 2004 species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+melanogaster Acamptopoeum melanogaster Compagnucci, 2004 species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse Acamptopoeum nigritarse (Vachal, 1909) species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+nigritarse Acamptopoeum nigritarse(Vachal, 1909) species
https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii Acamptopoeum prinii (Holmberg, 1884) species HAS_ACCEPTED_NAME https://www.discoverlife.org/mp/20q?search=Acamptopoeum+prinii Acamptopoeum prinii (Holmberg, 1884) species
$ nomer list --properties my.properties --include-header discoverlife | head -n4 > withCustomHeader.tsv
...
$ nomer list --properties my.properties discoverlife | head -n4 > withoutCustomHeader.tsv
...
with attached withCustomHeader.tsv / withoutCustomHeader.tsv
withoutCustomHeader.tsv.txt withCustomHeader.tsv.txt
As far as I can tell, I don't see any unexpected results: the specified input and output schemas are used, and include things like provided authorship and resolved authorship.
Please confirm.
@jhpoelen you are correct, those using --properties my.properties are the same and correct.
I was commenting on the output without using my.properties. my.properties adds columns (authors, ranks) but also removes columns resolvedCommonNames, resolvedPath, resolvedPathIds, resolvedPathNames, resolvedExternalUrl, resolvedThumbnailUrl
@seltmann thanks for clarifying and for being patient with me.
Yes, if you specify a non-default schema, you'd have to explicitly include all the desired columns.
Am curious to hear thoughts on how to make this schema business a little more intuitive (if needed). Otherwise, please let me know if you have any remaining desires / comments on the current --include-header
functionality introduced in v0.2.9 .
@jhpoelen can I get a list of other properties I can add to my.properties? I see that nomer -p allows me to use a custom properties, but can I configure this to also include resolvedPath for example. Thanks!
@seltmann adding a new issue for your feature request to list all columns that can be added to input/output schemas.
@seltmann are there any remaining issues regarding to this specific issue (add header to nomer dump/list) ?
See issue #67 for feature request related to schema terms.
Add the headers to the name file dumps in nomer.