bojone / bert4keras

keras implement of transformers for humans
https://kexue.fm/archives/6915
Apache License 2.0
5.37k stars 927 forks source link

ner 任务出现解析时substring not found #141

Closed sataliulan closed 4 years ago

sataliulan commented 4 years ago

提问时请尽可能提供如下信息:

基本信息

核心代码

def getNerInstance():
    return NamedEntityRecognizer(trans=K.eval(CRF.trans), starts=[0], ends=[0])

text=''' 2014  2.DefectsprofilesinAgIonImplantedMoStudiedbySlowPositronBeam  ZhuWang,PengfeiTai,FengshouTian,LiangliangLiu,ChunqingHe  SLOPOS13,Sep15-20,2013,Munich,Germany  3.Self-healingmechanismofirradiationdefectsnearΣ=11113grainboundaryincopper  LiangliangLiua,ZhengTangb,WeiXiaoc,ZhuWanga,  MaterialsLetters,Volume109,15October2013,Pages221–224  4.EffectofHydrogenonO2AdsorptionandDissociationonaTiO2Anatase001Surface  LiangliangLiu,ZhuWang,ChunxuPan,WeiXiao,andKyeongjaeCho  ChemPhysChem2013,14,996–1002  5.PlasticDeformationofNanocrystallineZincInvestigatedbyPositronAnnihilationLifetimeSpectroscopy  ZHOUKai周凯,LIHui李辉,WANGZhu王柱  CHIN.PHYS.LETT.Vol.30,No.52013057804  6.Effectsofoxidedistributedingrainboundariesonmicrostructurestabilityofnanocrystallinemetals  KaiZhou,HuiLi,JinBiaoPang,ZhuWang  JournalofPhysics:ConferenceSeries4432013012020  7.ThermalstabilityofnanocrystallineCustudiedbypositronannihilationlifetimespectroscopyandx-raydiffraction  KaiZhou,HuiLi,JinBiaoPang,ZhuWang  PhilosophicalMagazine,Vol.92,No.16,1June2012,2079–2088  8.ThermalstabilityofgrainboundariesinnanocrystallineZnstudiedbypositronlifetimespectroscopy  KaiZhou,HuiLi,JinBiaoPang,ZhuWang  PhysicaB,407,20121219–1222  9.Correlationbetweendislocationsandvacancy-defectsstudiedbyPositronAnnihilationSpectroscopy  PANGJinbiao,LIHui,ZHOUKai,WANGZhu  PlasmaScienceandTechnology,Vol.147,2012pp650-655  10.Thermalstabilityofdefectsinplasticallydeformedsiliconstudiedbypositronlifetimespectroscopy  Jin-BiaoPang,HartmutSLeipner,ReinhardKrause-Rehberg,ZhuWang,KaiZhouandHuiLi  Semicond.Sci.Technol.2720120350237pp  11.Defectpropertiesofas-grownandelectron-irradiatedTe-dopedGaSbstudiedbypositronannihilation.  LiHui,ZhouKai,PangJingbiao,ShaoYundong,WangZhu,ZhaoYouwen.  SemiconductorScienceandTechnology.2011,26,0750166ppSCI  12.InvestigationofmicrostructurethermalevolutioninnanocrystallineCu  KaiZhou,HuiLi,JinBiaoPang,ZhuWang  PhysicaB,V4062011760-765  13.Asimplifieddigitalpositronlifetimespectrometerbasedonafastdigitaloscilloscope  LiHui,ShaoYundong,ZhouKai,PangJingbiao,WangZhu  NuclearInstrumentsandMethodsinPhysicsResearch,A2011,625,29-34SCI  14.Proton-irradiationinduceddefectsinTe-dopedGaSbstudiedbyphotoluminescenceandpositronannihilationspectroscopy  KaiZhou,HuiLi,ZhuWang  ModernPhysicsLettersB,Vol.24,No.272010'''

AlbertNer=getNerInstance()
entity_dict = AlbertNer.recognize(text)

输出信息

Traceback (most recent call last):
  File "D:\Parker\jiangxi\StructurePeriod3\albert_sequence_labeling_ner_crf.py", line 106, in recognize
    mapping = tokenizer.rematch(text, tokens)
  File "D:\Anaconda3\envs\tf1.15\lib\site-packages\bert4keras\tokenizers.py", line 379, in rematch
    start = text[offset:].index(token) + offset
ValueError: substring not found
text: 2014  2.DefectsprofilesinAgIonImplantedMoStudiedbySlowPositronBeam  ZhuWang,PengfeiTai,FengshouTian,LiangliangLiu,ChunqingHe  SLOPOS13,Sep15-20,2013,Munich,Germany  3.Self-healingmechanismofirradiationdefectsnearΣ=11113grainboundaryincopper  LiangliangLiua,ZhengTangb,WeiXiaoc,ZhuWanga,  MaterialsLetters,Volume109,15October2013,Pages221–224  4.EffectofHydrogenonO2AdsorptionandDissociationonaTiO2Anatase001Surface  LiangliangLiu,ZhuWang,ChunxuPan,WeiXiao,andKyeongjaeCho  ChemPhysChem2013,14,996–1002  5.PlasticDeformationofNanocrystallineZincInvestigatedbyPositronAnnihilationLifetimeSpectroscopy  ZHOUKai周凯,LIHui李辉,WANGZhu王柱  CHIN.PHYS.LETT.Vol.30,No.52013057804  6.Effectsofoxidedistributedingrainboundariesonmicrostructurestabilityofnanocrystallinemetals  KaiZhou,HuiLi,JinBiaoPang,ZhuWang  JournalofPhysics:ConferenceSeries4432013012020  7.ThermalstabilityofnanocrystallineCustudiedbypositronannihilationlifetimespectroscopyandx-raydiffraction  KaiZhou,HuiLi,JinBiaoPang,ZhuWang  PhilosophicalMagazine,Vol.92,No.16,1June2012,2079–2088  8.ThermalstabilityofgrainboundariesinnanocrystallineZnstudiedbypositronlifetimespectroscopy  KaiZhou,HuiLi,JinBiaoPang,ZhuWang  PhysicaB,407,20121219–1222  9.Correlationbetweendislocationsandvacancy-defectsstudiedbyPositronAnnihilationSpectroscopy  PANGJinbiao,LIHui,ZHOUKai,WANGZhu  PlasmaScienceandTechnology,Vol.147,2012pp650-655  10.Thermalstabilityofdefectsinplasticallydeformedsiliconstudiedbypositronlifetimespectroscopy  Jin-BiaoPang,HartmutSLeipner,ReinhardKrause-Rehberg,ZhuWang,KaiZhouandHuiLi  Semicond.Sci.Technol.2720120350237pp  11.Defectpropertiesofas-grownandelectron-irradiatedTe-dopedGaSbstudiedbypositronannihilation.  LiHui,ZhouKai,PangJingbiao,ShaoYundong,WangZhu,ZhaoYouwen.  SemiconductorScienceandTechnology.2011,26,0750166ppSCI  12.InvestigationofmicrostructurethermalevolutioninnanocrystallineCu  KaiZhou,HuiLi,JinBiaoPang,ZhuWang  PhysicaB,V4062011760-765  13.Asimplifieddigitalpositronlifetimespectrometerbasedonafastdigitaloscilloscope  LiHui,ShaoYundong,ZhouKai,PangJingbiao,WangZhu  NuclearInstrumentsandMethodsinPhysicsResearch,A2011,625,29-34SCI  14.Proton-irradiationinduceddefectsinTe-dopedGaSbstudiedbyphotoluminescenceandpositronannihilationspectroscopy  KaiZhou,HuiLi,ZhuWang  ModernPhysicsLettersB,Vol.24,No.272010
tokens: ['[CLS]', '2014', '2', '.', 'de', '##fe', '##cts', '##pro', '##file', '##si', '##na', '##gion', '##im', '##pl', '##ant', '##ed', '##mo', '##st', '##ud', '##ie', '##db', '##ys', '##low', '##po', '##sit', '##ron', '##be', '##am', 'zh', '##u', '##wang', ',', 'pen', '##g', '##fe', '##ita', '##i', ',', 'fe', '##ng', '##sh', '##out', '##ian', ',', 'li', '##ang', '##lia', '##ng', '##li', '##u', ',', 'ch', '##un', '##qi', '##ng', '##he', 's', '##lo', '##po', '##s', '##13', ',', 'sep', '##15', '-', '20', ',', '2013', ',', 'mu', '##nic', '##h', ',', 'ge', '##rman', '##y', '3', '.', 'self', '-', 'he', '##al', '##ing', '##me', '##cha', '##nis', '##mo', '##fi', '##rr', '##ad', '##ia', '##tion', '##de', '##fe', '##cts', '##ne', '##ar', '##ς', '=', '1111', '##3', '##g', '##rain', '##bo', '##und', '##ary', '##in', '##co', '##pper', 'li', '##ang', '##lia', '##ng', '##li', '##ua', ',', 'zh', '##eng', '##tan', '##gb', ',', 'wei', '##xi', '##ao', '##c', ',', 'zh', '##u', '##wang', '##a', ',', 'mate', '##rial', '##sl', '##ette', '##rs', ',', 'vol', '##ume', '##10', '##9', ',', '15', '##oc', '##to', '##ber', '##2013', ',', 'page', '##s', '##22', '##1', '–', '224', '4', '.', 'ef', '##fe', '##ct', '##of', '##hy', '##dr', '##og', '##en', '##on', '##o2', '##ads', '##or', '##pt', '##ion', '##and', '##di', '##ss', '##oc', '##ia', '##tion', '##ona', '##ti', '##o2', '##ana', '##ta', '##se', '##001', '##su', '##rf', '##ace', 'li', '##ang', '##lia', '##ng', '##li', '##u', ',', 'zh', '##u', '##wang', ',', 'ch', '##un', '##x', '##up', '##an', ',', 'wei', '##xi', '##ao', ',', 'and', '##ky', '##eo', '##ng', '##ja', '##ec', '##ho', 'ch', '##em', '##ph', '##ys', '##che', '##m2', '##013', ',', '14', ',', '99', '##6', '–', '100', '##2', '5', '.', 'p', '##last', '##ic', '##de', '##form', '##ation', '##of', '##nan', '##oc', '##ry', '##sta', '##lli', '##ne', '##zi', '##nc', '##in', '##ves', '##ti', '##gate', '##db', '##y', '##po', '##sit', '##ron', '##ann', '##i', '##hi', '##la', '##tion', '##life', '##times', '##pe', '##ct', '##ros', '##co', '##py', 'zh', '##ou', '##ka', '##i', '周', '凯', ',', 'li', '##hu', '##i', '李', '辉', ',', 'wang', '##z', '##hu', '王', '柱', 'chi', '##n', '.', 'ph', '##ys', '.', 'let', '##t', '.', 'vol', '.', '30', ',', 'no', '.', '520', '##13', '##05', '##78', '##04', '6', '.', 'ef', '##fe', '##cts', '##of', '##ox', '##ide', '##di', '##st', '##ri', '##bu', '##ted', '##ing', '##rain', '##bo', '##und', '##ari', '##es', '##on', '##mic', '##ros', '##tr', '##uc', '##ture', '##sta', '##bility', '##of', '##nan', '##oc', '##ry', '##sta', '##lli', '##ne', '##me', '##tal', '##s', 'kai', '##z', '##ho', '##u', ',', 'hu', '##il', '##i', ',', 'ji', '##n', '##bia', '##op', '##ang', ',', 'zh', '##u', '##wang', 'journal', '##of', '##ph', '##ys', '##ics', ':', 'con', '##ference', '##ser', '##ies', '##44', '##32', '##013', '##012', '##02', '##0', '7', '.', 'the', '##rma', '##ls', '##ta', '##bility', '##of', '##nan', '##oc', '##ry', '##sta', '##lli', '##ne', '##cus', '##tu', '##die', '##db', '##y', '##po', '##sit', '##ron', '##ann', '##i', '##hi', '##la', '##tion', '##life', '##times', '##pe', '##ct', '##ros', '##co', '##py', '##and', '##x', '-', 'ray', '##di', '##ff', '##ra', '##ction', 'kai', '##z', '##ho', '##u', ',', 'hu', '##il', '##i', ',', 'ji', '##n', '##bia', '##op', '##ang', ',', 'zh', '##u', '##wang', 'ph', '##il', '##os', '##op', '##hi', '##cal', '##ma', '##ga', '##zi', '##ne', ',', 'vol', '.', '92', ',', 'no', '.', '16', ',', '1', '##jun', '##e', '##2012', ',', '207', '##9', '–', '208', '##8', '8', '.', 'the', '##rma', '##ls', '##ta', '##bility', '##of', '##g', '##rain', '##bo', '##und', '##ari', '##es', '##in', '##nan', '##oc', '##ry', '##sta', '##lli', '##ne', '##z', '##ns', '##tu', '##die', '##db', '##y', '##po', '##sit', '##ron', '##life', '##times', '##pe', '##ct', '##ros', '##co', '##py', 'kai', '##z', '##ho', '##u', ',', 'hu', '##il', '##i', ',', '[SEP]']

自我尝试

已做了异常处理

bojone commented 4 years ago

发现只在python3出错,出错的地方是

from bert4keras.tokenizers import Tokenizer

dict_path = '/root/kg/bert/chinese_L-12_H-768_A-12/vocab.txt'
tokenizer = Tokenizer(dict_path, do_lower_case=True)  # 建立分词器

text = 'rΣ'
tokens = tokenizer.tokenize(text)
tokenizer.rematch(text, tokens)
bojone commented 4 years ago

已经修复 https://github.com/bojone/bert4keras/commit/ffa50c5125d001441f69184282e75dd4e78481f9