Closed xflr6 closed 2 years ago
In [1]: import pandas as pd ...: ...: URL = 'https://github.com/autotyp/autotyp-data/raw/master/data/Register.csv' ...: ...: ISO = r'[a-z]{3}$' ...: GCODE = r'[a-z]{4}[1-9][0-9]{3}$' ...: ...: df = pd.read_csv(URL, encoding='utf-8', index_col='LID') ...: ...: df.loc[~df['ISO639.3'].str.match(ISO).fillna(True), ['Language', 'Stock', 'ISO639.3']] Out[1]: Language Stock ISO639.3 LID 301 Tocharian Indo-European tokh 185 Mixe Mixe-Zoque mixe 431 Berber Berber berb 762 Cuica Macro-Ge cuic 764 Esmeralda Esmeralda esme 766 (Frisian) Indo-European fris 800 Sorbian Indo-European sorb 1696 Chaga Benue-Congo chag In [2]: df.loc[~df['Glottocode'].str.match(GCODE).fillna(True), ['Language', 'Stock', 'Glottocode']] Out[2]: Language Stock Glottocode LID 672 Jingpho Sino-Tibetan jin1260
This is fixed in 1.0.0