autotyp / autotyp-data

AUTOTYP data export
Creative Commons Attribution 4.0 International
38 stars 18 forks source link

Unknown LIDs (2915, 3000) referenced in data/*.csv #10

Closed xflr6 closed 2 years ago

xflr6 commented 6 years ago
In [1]: import pathlib
    ...: 
    ...: import pandas as pd
    ...: 
    ...: DIR = pathlib.Path('~/projects/cldf/autotyp-data/data').expanduser()
    ...: DATA = sorted(p for p in DIR.glob('*.csv') if p.name != 'Register.csv')
    ...: 
    ...: lf = pd.read_csv(DIR / 'Register.csv', encoding='utf-8', index_col='LID')
    ...: 
    ...: for d in DATA:
    ...:     df = pd.read_csv(d, encoding='utf-8')
    ...:     missing = df.loc[~df['LID'].isin(lf.index), ['LID']]
    ...:     if not missing.empty:
    ...:         print(d.name)
    ...:         print(missing)
Grammatical_markers.csv
       LID
2854  2915
4685  2915
NP_per_language.csv
      LID
479  3000
NP_structure.csv
       LID
922   3000
1029  3000
NP_structure_presence.csv
       LID
1012  3000
1013  3000
tzakharko commented 2 years ago

This is fixed in 1.0.0. Apologies for taking so long, we were busy rebuilding the data export and aggregation pipeline from scratch.