globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
19 stars 3 forks source link

Including family as an output using properties #120

Closed jtmiller28 closed 2 years ago

jtmiller28 commented 2 years ago

Hello, I was wondering if there is a way to create an output that includes the taxonomic family of our resolved names. I know currently we can include the pipe hierarchical list kingdom | phylum | order | etc, however I am hoping to just create a output that has a column with family associated with the resolvedName.

I modified the properities: my.properties2.gz to include: my.properties3.gz

Current input: echo -e "\tHelianthemum scoparium\tNutt. ex Torr. & A.Gray" | nomer append wfo --include-header --properties file:/home/jt-miller/Globi-Bees-Plant-Interactions/Plant-Data/Nomer-wfo-resolution/Final-Resolution/my.properties3 Output: [main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [wfo] providedExternalId providedName providedAuthorship relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedFamily [main] INFO org.globalbioticinteractions.nomer.match.WorldOfFloraOnlineTaxonService - [WORLD_OF_FLORA_ONLINE] taxonomy already indexed at [/home/jt-miller/.cache/nomer/world_of_flora_online/world_of_flora_online], no need to import. Helianthemum scoparium Nutt. ex Torr. & A.Gray HAS_ACCEPTED_NAME WFO:0001295598 Helianthemum scoparium Nutt. ex Torr. & A.Gray species

jhpoelen commented 2 years ago

@jtmiller28 thanks for asking!

I was able to produce:

echo -e "\tHelianthemum scoparium\tNutt. ex Torr. & A.Gray" | nomer append wfo --include-header --properties my.properties
    Helianthemum scoparium  Nutt. ex Torr. & A.Gray HAS_ACCEPTED_NAME   WFO:0001295598  Helianthemum scoparium  Cistaceae

with my.properties:

nomer.append.schema.output=[{"column":0,"type":"externalId"},{"column": 1,"type":"name"},{"column": 2,"type":"path.family.name"}]

you can specify taxon path parts by using the "." notation - path.[rank].[{id | name}]

e.g.,

path.family.name path.family.id path.genus.name path.genus.id etc.

Does that answer your question?

Also, please let me know if there's any suggestions you may have to make this easier for your past self to find out.

jhpoelen commented 2 years ago

Actually I meant to say - a way for our past / future selves to remember this weird path schema notation. This include me !

jtmiller28 commented 2 years ago

Yes, that is exactly what I was looking for.

Speaking for my past self, I am generally vague on how to build my.properties files in order to build customized query outputs from Nomer. Without having seen your base my.properties in previous issues, I probably would not of been able to build it without understanding Nomer better.

Would it be worth considering adding a section on building proper properties files to create customized queries in the ReadMe? This might make it a bit more transparent when using this feature of Nomer. I am also finding more and more that its pretty useful for research and organization alike. Depending on the group your working with, specific information regarding some of the outputs are more important then others. For example: a lot of plant curation and organization is at the family level, therefore having your output have just a family level is really helpful for organization purposes.

Of course this could just be a personal issue! I am still fairly new to the computer science world compared to biology. For example, when reading the ReadMe, I thought I might of been looking for something in the json section, but ended up being more confused.

jhpoelen commented 2 years ago

Thanks for your notes.

I was thinking . . . would it help to include some examples as commented properties when typing

nomer properties 

?

Something like:

nomer.append.schema.output=[{"column":0,"type":"externalId"},{"column": 1,"type":"name"},{"column": 2,"type":"rank"},{"column": 3,"type":"commonNames"},{"column": 4,"type":"path"},{"column": 5,"type":"pathIds"},{"column": 6,"type":"pathNames"},{"column": 7,"type":"externalUrl"},{"column": 8,"type":"thumbnailUrl"}]
# uncomment specific schema examples below 
# including family name as a separate column
# nomer.append.schema.output=[{"column":0,"type":"externalId"},{"column": 1,"type":"name"},{"column": 2,"type":"rank"},{"column": 3,"type":"path.family.name"}]
# including family id as a separate column
# nomer.append.schema.output=[{"column":0,"type":"externalId"},{"column": 1,"type":"name"},{"column": 2,"type":"rank"},{"column": 3,"type":"path.family.id"}]
jtmiller28 commented 2 years ago

Yes I believe that would be a great addition, just some guidance on using it adding specific paths for customized queries would be great.