UW-xDD / text2graph_llm

An experimental API endpoint to convert text to knowledge graph triplets.
MIT License
2 stars 1 forks source link

Macrostrat-integration #9

Closed JasonLo closed 6 months ago

JasonLo commented 6 months ago

Should we use strat_name or strat_name_long from macrostrat API for exact match? I am currently using strat_name. Also, there are some abbrevation like ['Mbr', 'Fm', 'Gp', 'SGp'] that I didn't consider. May need some improvements...

ilmcconnell commented 6 months ago

@JasonLo In one third of all cases strat_name == strat_name_long. In the remaining 2/3, the main difference I can see between unit_name and strat_name_long is that the long entry includes the stratigraphic name rank. So it's more formal and specific. I'm ok with keeping it strat_name for now, that's a higher recall option.

from text2graph.geolocation.macrostrat import all_strat_names_long

all_stratnames = all_strat_names_long()
sames = 0
diffs = 0
for record in all_stratnames:
    sn = record["strat_name"]
    snl = record["strat_name_long"]
    if sn == snl:
        sames += 1
    if sn != snl:
        diffs += 1
        print(sn, snl)

print(f"{sames=}, {diffs=}")
print(f"{sames / (sames + diffs) * 100}")