Closed zedomel closed 2 years ago
hey @zedomel -
you can use the translate names as you suggested.
if you'd like to include an taxonomic hierarchy in the translation, you might benefit from using the globi
matcher after pointing the nomer properties for that matcher to your own taxonMap and taxonCache.
For instance, you can say:
echo -e "\tDonald duckus" | nomer append --properties my.properties globi
with my.properties containing like:
nomer.term.cache.url=https://zenodo.org/record/6394935/files/taxonCacheFirst10.tsv nomer.term.map.url=https://zenodo.org/record/6394935/files/taxonMapFirst10.tsv j
the taxonMap make a naive map of provide id/name -> resolve id/name
and taxonCache includes additional information for resolved id/names.
for schema, see provided example.
Let me know if you need more help to get started, or whether you have any suggestions.
@zedomel I am assuming I answered your question on how to translate verbatim names to normalized names using Nomer.
If not, please comment and share your thoughts on how to better support the name translation.
Hi @jhpoelen
I have this two columns files with original/verbatim names and normalized names and I would like to use
nomer
to mapping verbatim names to normalized names.I have used
grep
for that, but it is very slow. You talked about usingtranslate-names
matcher and I'm wondering if it can be extended to provide the whole classification (taxonomic ranks). For example if a I have this files:where the first column is the verbatim name, how can I use
translate-names
to get all the data up to the second column when a match is found?The solution that I found was to provide a two column mapping file for
nomer.taxon.name.correction.url
where I provided the corrected name + full hierarchy in the second column separated by a delimiter (e.g.#
). After runingnomer replace translate-names
I replaced this dummy delimiter by a actual field delimiter (\t
):nomer.taxon.name.correction.url
file:command:
cat names.tsv | nomer replace translate-names | sed 's/#/\t/g' > names-translated.csv
thanks.