Open seasidesparrow opened 2 years ago
Normalization should be happening in adspy as a part of Exports. The export codes include a line import ads.Keywords as kw_normalizer
and attempt this normalization via normalized_keywords = ', '.join(kw_normalizer.get_normalized_keywords(kws))
In Keywords, this is a straightforward function:
def get_normalized_keyword(keyword): """Returns a normalized keyword.""" normalized_keyword = None if keyword in KEYW2NORM: normalized_keyword = KEYW2NORM[keyword].strip() else: normalized_keyword = normalize_keyword(keyword).strip() if normalized_keyword in ASTKEYWORDS: return normalized_keyword.replace(', ', ' ') else: return None
so I wonder if the UAT data aren't being passed via config properly?
The keyw2norm.pickle file in adspy/etc is dated January 2011, so it predates UAT.
See 2022ApJ...927....1M:
"keyword": ["1964", "1483", "1989", "1485", "2009", "1974", "1477", "1503", "1476", "1533", "1493", "2170", "Astrophysics - Solar and Stellar Astrophysics"], "keyword_norm": ["-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-"], "keyword_schema": ["UAT", "UAT", "UAT", "UAT", "UAT", "UAT", "UAT", "UAT", "UAT", "UAT", "UAT", "UAT", "arXiv"],