allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.65k stars 223 forks source link

Wrong named entities report (unbalanced brackets) #236

Open iacopy opened 4 years ago

iacopy commented 4 years ago

Hi, I just report problematic named entities I found using en_core_sci_sm, to improve the model. Most of them contain unbalanced brackets.

(-2)-0-(+2
(-8
(= control values
(= controls
(= group b
(= proliferating 
(= week-4
(0.3
(1)h nmr) spectroscopy
(125)i-aβ(1
(13)c)-labeled
(1s,2s,3s,4r,5s)-5-[4-chloro-3-(4-ethoxybenzyl)phenyl]-1-hydroxymethyl-6,8-dioxabicyclo[3.2.1]octane-2,3,4-triol
(3.6
(7.3
(7.6
(7.9
(9
(arg1,phe5)-ha(aha
(auc)(0
(bmi)>25 kg/m(2
(e)-2-[(e)-but-2-en-1-yl-idene]hydrazinecarboxamide}
(me-17030)-(not significant
(p=1x10
(plga)/(met-loaded
(s/t)px(k/h/r
(taaaa)(n
[(18)f]-2-fluoro-2-deoxy-d-glucose ([(18)f]fdg
[(3)h]o-methyl-d-glucose (
[1
[1,2
[1.9
[2
[3
[3h]mpp(+
[5
[6-(3)h]glucose and [u-(14)c]alanine
[6,6
[6.3
[d-leu-4]-ob3
[methoxy-(11)c]pd-153035
[nle27]ghrh-(1
[prl1
*
* metformin
**
**p
*10
*2
*3
*3d
∆(4
∆i(30)/∆g(30
+
+/-
+/-standard deviation
++
1,25-dihydroxyvitamin d(3
1.05).conclusions
1.432[1.068
1.91[- 
100)%
13c]lactate:[1
15-epi-lipoxin a(4
1h nmr) spectroscopy
2-(n
2-amino-4-(dimethyl-amino)-7-methyl-5,7-dihydro-6h-[1,3,5]triazepin+ ++-6-one
2-deoxy-[1,2-(3)h]-glucose
2,3,4,5-tetrachlorophenol (
2.55[1.64;3.96
2009).a
29)-nh2
2a) resolution
3-(4-methanesulfonylphenoxy)-n-[1-(2-methoxy-ethoxymethyl)-1h-pyrazol-3-yl]-5-(3-methyl pyridin-2-yl)-benzamide
3-[4,5-dimethylthiazol-2-yl]-2,5 diphenyl tetrazolium bromide
3-[4,5-dimethylthiazol-2-yl]-2,5-diphenyltetrazolium bromide
3-dioleoyloxy-propyl)-trimethylammonium/cholesterol/dspe-peg-anisamide aminoethyl
3.7(0.8
3(2
311+g(d
33.5(6.6
36)amide
3tw(pm
4-[(2e)-n'-(2,2'-bithienyl-5-methylene)hydra-zinecarbonyl]-6,7-dihydro-1-phenyl-1h-pyrazolo[3,4-d]pyridazin-7-one
4(1/2
43.20(30.1
5-dimethylthiazol-2-yl)-2
5/33(15
50.(abstract truncated
50.(abstract truncated
57.4]).the hazard ratio
5ht(2c
6-[4-(2-piperidin-1-ylethoxy)phenyl]-3-pyridin-4-ylpyrazolo[1,5-a]pyrimidine
6-[ethyl-(3-isopropoxy-4-isopropylphenyl)amino
7-[(3r)-3-(1-aminocyclopropyl)pyrrolidin-1-yl]-1-[(1r,2s)-2-fluorocyclopropyl]-8-methoxy-4-oxoquinoline-3-carboxylic acid
7.65)%
9)nonapeptide-ethylamide
a(-/-
a(+/-
a(1c
a1c)
abdominal) fat mass
ability.(abstract truncated
ability.(abstract truncated
absorption enhancer sodium n-(8-[2-hydroxybenzoyl
ac2+)
acarbose treatment.(abstract truncated
acc2 ki)
acid bis-[6-methyl-heptyl
acrylic acid)-grafted-gellan
active.(abstract truncated
acute pancreatitis(ap
adenosine monophosphate-activated protein kinase)-dependent manner
adipoq)
adjusted odd ratio(aor
adma-metabolizing enzyme) activity
adults.(abstract truncated
aflatoxin b(1
aflatoxin b(1
age-adjusted r(s
agent(s
agents(metformin
ageoptical density(oad
agonist)-stimulated activity
agonist/5-ht(2a
aha(s
akt ser(473
albufera natural park (spain
alcl(3
aldh(+
alone.(abstract truncated
alone.(abstract truncated
alzheimer'sdisease(ad
am)/liver kinase b1
am1241 (2-iodo-5-nitrophenyl)-[1-(methylpiperidin-2-ylmethyl)-1 h-indol-3-yl]methanone
american psychological association)
amino)-2-deoxyglucose
amp-activated protein kinase) pathway
amp-activated protein kinase) signalling
ampk-α(1
ampk(thr172
ampkα(1/2
ampkα(s173
ampyra®)
and(c
and(or
andhemoglobin a1c(a1c
ang ii)-induced hypertension
ang ii)-treated renal fibroblast nrk-49f
ang-(1
angiotensin ii)-infusion
anti)advanced
antidiabetic agent(s
antineoplastic agent(s
antineoplastic agent(s
antipsychotic(s
apical mpp(+
apoa1(-/-
apoe(-/-
apoe(-/-)/ampkalpha2(-/-
apoe(-/-)/ampkα1(-/-
apolipoprotein (
apoptosis-related ets-like 1 transcription factor(elk-1
apparent v(max
are(s
arg-phe-nh2 (
arg(972
arizona sexuel experience) scale
aromatic) bonds
arteriograph®)
ascending p-ampkα(-thr172
asp(+
associated (
association(s
asymmetric ν(v-o
atcc crl-1550) cancer
atm-ampk-p53/p21(cip1
auc(∞
auc(0
auc(0-)(infinity
auc(0-6h
auc(0-last
auc(0-proportional to
auc(0-t
auc(0,24 h
auc(0,infinity
auc(0:30
auc(45
auc(60
auc(a
auc(e2
auc(gluc
auc(glucagon
auc(inf
auc(ins
auc(insulin
auc(lqc
auc(tau
auc[0:29
auc[24
audio-video sexual stimulation(χ²=34.422
aurc(lh
autoimmune) diabetes mellitus
avandamet® (
average hemoglobin a(1c
b(0.22±0.13
b(1
b(12
b) pathway
b6-lep(ob/ob
b6.v-lep(ob/ob
bafilomycin a(1
basal hemoglobin a(1c
based)-classical methods
baseline-adjusted hba(1c
baseline(53.6
basis).participant
bcl(xl
bel/fu) cells
beta-(4-dimethylamino phenyl
beta-(4-dimethylamino phenyl
beta1(186
bhr),group ii
biasp 30)
bilateral internal capsules.(abstract truncated
bilateral internal capsules.(abstract truncated
bind as(iii
biphasic insulin aspart 70/30 (biasp70/30
biphenyl]-3-yl)methoxy)phenoxy)acetic acid
bis-2-(ethylhexyl
bis(n'n'-dimethylbiguanidato)oxovanadium(iv
bl21(de3
blue)-binding method
bmi 26.3+/-3.3 kg/m(2
bmi<25kg/m(2
bmi>32 kg/m(2
bmi≥25kg/m(2
body mass index 28.5+/-0.6 kg/m(2
body mass index>30kg/m(2
body-mass index ≤45 kg/m(2
body-mass index 45 kg/m(2
bowel.(abstract truncated
bowel.(abstract truncated
brain cells(2
breast reconstruction(ibr
british national formulary (joint formulary committee
buoh(0.1
bw·d(-1)(low
bw).gp iv
c-18)
c-3/c-4 of →2)-α-l-rhap-(1→.
c-c motif) ligand 2
c-c motif) ligand 3
c-ncad(838-856
c-terminal sequence leu-trp-nh(2
c.808 (g>t
c(12)h(10)n(4)·2c(8)h(8)o(2
c(12)h(14)n(2)o(4
c(14)o(2
c(18
c(18
c(19
c(ave
c(in
c(max
c(max
c(p)/c(bc
c(ss
c).the lipid profile
c++
c++ programming language
c57bl/ksj-db/db (db/db
ca²(+
ca2+)
ca2+)
cacit vitamin d30(r
cancer prev res (
cand+hctz)
carbomer 934p (
carbopol(®
carboxymethyl sesbania gum-2.5%(w/v
cardiac hazards;(2
cardiotoxin- (
carpolobia alba (
cases.(abstract truncated
cases.(abstract truncated
cassia obtusifolia l. (
castration.(abstract
castration.(abstract
cav-1(-/-
cb(1
cb[6
cbdiet (p<.0001
cbl) deficiency
cck-8) assay
cd11b(+) gr1(+
cd11c(+)cd206(+
cd2+)
cd34(+)/cd7(-)/cd4(-
cd44(+)cd117(+
cd44(pos)cd24(low/neg
cd45-cd34+)
cd8(+))
cdkn1a(tmi/tyj)/j p21(-/-
cell proliferation)-
cell/mm(3
cells.(abstract truncated
cellular mode(s
cervical-vaginal fluid(cvf
chcl3-cc) fraction
chemwell® 2910 (
chi(2
chinese)/ xiao bojian
chinese)/xiao bojian
chorionic gonadotropin loadings.(abstract truncated
chorionic gonadotropin loadings.(abstract truncated
chronic renal insufficiency cohort) study
ci(0.55
ci)=[-1.47
ci)=0.75
ci]=0.06%(-1 mmol/mol
cidophage®)
cilag (switzerland
cilag (switzerland
circulating e(2
ck14+)
cl(-
cl(cr
cl(int
cl(nr
cl(r
cl(renal
cl(sec
class 5([women])/6([men
clerodendrum volubile)
clinicaltrials.gov (nct01885013
clinicaltrials.gov (nct02588859
clomiphene citrate alone.(i-a
clomiphene) trial
cm[range
cnox-2(+
cntf(ax15
co-ip) assays
co(2
co2·(-
coenzyme q10 (
coherent anti-stokes raman scattering) microscopy
cohort 2)
combination)-may
con a)-
con)
concentration.(abstract truncated
conclusion(s
condition(s
connexin 43 (
continued health) registry
contrast-induced nephropathy(cin
control (c
control group(t=6.472
control metformin-treated (ct
controls.(abstract truncated
controls.(abstract truncated
controls.(abstract truncated
conventional immediate-release (ir
copper-bis(thiosemicarbazones
copper(ii
cortisol) levels
cortisol) levels
cox) pathway
cpm/10(6
crc(1.670
creatinine-based ckd-epicrea (
creatinine(crea
cs-137) behavior
csf1-deficient csf1(op)/csf1(op
csf1(op
csf1(op)/csf1(op
csfm(op
csii.(abstract truncated
cu (ii
cu(c(19)h(17)n(5)o(5
cu(cl)2(met)(en
cu(ii)4(bpp)4(maa)8(h2o)2).2h2o
cucl2(c19h16n4o
curve(0
curve(0-τ
cyclic gmp/10(8
cyclophosphamide-methotrexate-5-fluorouracil)
cytochrome p450)
cytosolic nadh/nad(+
d-ser(bu)6-lh-rh(1
d-ser(bu)6-lh-rh(1
d(1
daudi (burkitt
day-1)/placebo
days)-treated
db/db)
death.(3
decreasing j(max
del(17p
delta phi(m
deltahba1c (hba1c
dendritic poly(l-lysine
denmark) treat-to-target
depression(r=0.627
der(11
der(21
desogestrel/ethinyl oestradiol tablets(group b
detection bias.(a
dhea\dheas
diabetes symptom checklist-revised (dsc-r
diabetes symptoms checklist-revised (dsc-r
diabetic control) mice
diabetics.(abstract truncated
diabetics.(abstract truncated
diane-35(md group
diane(35
died.(abstract
diet-streptozotocin- (stz-
dihydrotestosterone(dht
diji(sp
dimethoate (organo-phosphorus
dimethoate (organo-phosphorus
dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide
dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide assay
dipeptidyl-peptidase-4 [dpp-4]-inhibitors
disease-free survival(dfs
dmldlr(-/-
docosahexaenoic acid(dha
dose 10(-7
dose- (
dose-dependent increase(p<0.05
dose-normalized c(max
dose(50 
downstream target p-p70s6k(thr389
dpp-ivi)
dpp4(+/+)
drug byetta(®
drug treatment.(3
drug-ta(cyd products
dsm-iv-tr (
dsm-iv-tr)
duragesic®)
dyes.(abstract
dyes.(abstract
e(-y
e(1
e(2
e(f
e(max
each).the groups
eampk(flox/flox
ec(50
ecstasy')
effect(s
effects)
egg)
empagliflozin/linagliptin (emg/lig
endothelial cells(ecs
endpoint hba(1c
endpoint(s
energy) diet
epcam(+
er-/pr-/her2+)
erectile dysfunction(ed
erectile function 15)
erk(1/2
estrogen receptor-positive (er(+)
estrogen)
estrogen)-high
ethinyl e(2
ethylene glycol)-b-poly
eucreas®)
eudragit(
eutirox®)
evidence).four
evidence).four trials
evidence).human menopausal gonadotrophin
evidence).one trial
evidence).the
evidence)the
exd-(9
exenatide/metformin/biphasic insulin aspart) therapy
exendin(9-39
exposure (
exposure)
f(1
f(1,46
f(2,54
f(t
fa/-)
fabp4(ap2
fao) cells
fasting blood glucose(fbg
fasting glucose/22.5)
feminization(gynecomastia etc
femoral artery(sfa
ferritin h) genes
fixed-dose combination therapy(fdc
fixed-dose combination/coad)
flk-1(+)/sca-1(+
fluorescent analog 2-[n-(7-nitrobenz-2-oxa-1,3-diazol-4-yl)amino]-2-deoxyglucose
formula 1/(log(10
forxiga®)
four-unit poly(ethylene glycol
fpqsflprg-nh(2
free t4(r=0.784
ft(4
full ms/dd-ms(2
full text (free
full-length hydra alpha1(iv
fumes.(abstract
fumes.(abstract
function.(abstract
function.(abstract
functional assessment of cancer therapy-breast (fact-b
g protein-coupled receptor 43/41 (
g-allele (
g. d. searle & co.)
g. d. searle & co.)
g.d.m. (o.r.
g(0)/g(1
g(1
g(f
g)-cyproterone acetate
g/(kg
g/kg)-combinations
g/kg)-treated groups
g401s (g
g6pc (
g6pc)
gastric mucosal prostaglandin e(2
gastrointestinal (gi
gen(100
general erectile function(χ²=54.433
genetic kk-a(y
germany (disease analyzer
getgoal-s) study
ginkgo) partners
gk(wt/del
glibenclamide auc(3
glibenclamide/glimepiride)
glimepiride c(max
glp-1(9-36
glp1r(+/+
glp1r(+/+)
glucagon-like peptide-1(7
glucophage (lipha
glucophage xr)*
glucophage(r
glucophage®)
glucose auc(0
glucose-6-phosphatase (
glucose-6-phosphatase (g6 pase
glucose-6-phosphatase (g6pase
glucose-6-phosphatase) metabolism
glut3-specific antiserum.(abstract truncated
glycated haemoglobin a(1c
glycated haemoglobin a1(c
glycated hemoglobin a(1c
glycogen synthase kinase-3[formula
glycoslyated hemoglobin a(1c
glycosylated haemoglobin a(1c
glycosylated hemoglobin a(1c
glycosylated hemoglobin a1c(hba1c
gn-rh) analogue
gn-rh) analogue
gnrh-like peptide(s
gnrh-like peptide(s
gnrh3) systems
group 1(control
group.(abstract
groups(n=10
growth/(ingestion-egestion
gsk- 3β) pathway
gtp gamma s) assay
h(2)o(2
h)-treated cells
haemoglobin a(1c
haemoglobin a(1c)(hba(1c)
haemoglobina(1c
hb a(1c
hba(1
hba(1c
hba(₁c
hba(1c) (
hba(1c)).[primary endpoint
hba(ic
hba(lc
hba[1(c
hba1(c
hba₁(c
hba1c(10.2
hbat1c(%
hc)/liver x receptor α
hco(3
hcy)
hdl(3
healthcare (
hemoglobin (hb)a(1c
hemoglobin a(1)c(hba(1)c
hemoglobin a(1c
hemoglobin a1(c
hemoglobin a1c)
hemoprotein(s
her-2/neu +)
her-2+)
her2)-negative breast cancer
her2+)
her2+/er-/pr-) cell lines
heteronuclear single quantum coherence)-filtered
hgcl(2
hif-2α)-can
high let)-irradiated cell lines
high-density lipoprotein (
high-fat diet-streptozotocin- (stz-
high-quality evidence).women
highly-active antiretroviral therapy)-associated lipodystrophy
hinterteil (hint
his(69
hispanic (his
hl(2
hmg-coa synthase 1(hmgcs1
ho8910-pm)             
homa-beta) index
homa-β(p
homa(beta-cell
homa(ogtt
homa<3.8)
hong kong)
hplc-ms(n
hplc-ms/ms) technique
hr.(abstract truncated
hrb).second-generation sulphonylureas
htn(doc
htt)
human papilloma virus-(hpv-
hydra biology(1
hydra piwi-like (hyli
hydra tcf (hytcf
hydra viridissima)
hydra) blood cells
hydrogen (h2
hydrogen (h2
hydroxyurea (hu
hypericum (kira
hypnotic drug(s
hypoglycemic episode(s
hypogonadism(testicular atrophy
hypothalamic-pituitary-adrenal (
hypothalamic-pituitary-adrenal (hpa
hypoxia-inducible factor- (
hypoxia-inducible factor- (hif-
hyvab®)
i(2
i(amiloride
i(control group
i(f
i(ouabain
ic(50
ic(50
ic(50
id(2
ie(isr
ifca2+)
igf-1r (tyr1165/1166
igf-1r) signaling pathway
igf-1r[tyr1165/1166
iglar(187 subjects
ii)
iief(1vs2
iief(1vs3
iief(2vs3
ileum.(abstract truncated
immunoreactive (ir
index group)-
index hba(1c
indian hh (
indication(s
inducer)-induced
inducible(i
inhibitory factor(mif
inhibitory substance(s
inorganic zn(se
ins2(+/akita
insulin levels(ir
insulin resistance index ln(homa-ir
insulin treated (i
insulin x glucose)/22.5
insulin-resistance (ir
insulin.(abstract truncated
insulin(auc
insulin(p=0.014
insulin)-only
intercellular signals.(abstract truncated
interleukin 10)
interleukin-6(il-6
interleukin‑6 (il‑6
intermediate- (
international continence society (ics)-"bph" (ics-male
international units/litre (iu/l
interquartile range) disease
intervention(s
intra))/delta(inter
intraduodenal (id
intramuscular (
ionic/ag(+
ip)-streptozotocin
ipmk(-/-
irs1-pser(307
irs1-ser(312
irs1(ser636/639
irβ(tyr
isi(ogtt
isr(0-2h
janumet ®)
janumet(tm
janumet®)
k-ras(+/lsl-g12d);trp53(+/lslr172h);pdx-1-cre
k(atp
k(b
k(d
k(d
k(e
k(i
k(m
k)/serine-threonine kinase(akt)pathway
k[atp
k+ channels.(abstract truncated
k121q (rs1044498
k2(c3n3o3h
k562 cells(p<0.05
k562r (imatinib-resistance
kcal/min/kg/10(3
kg).we
kg/cm(2
kg/m(2
kg⁻¹ day)⁻¹
ki-67+)
kinase(pi3k
kinase)/akt
kk-a(gamma
kmno(4
komboglyze®)
kras(+/lslg12vgeo);elas-tta/teto-cre
l-lactic-co-glycolic acid)-block-poly(ethylene
l-t(4
l. (family
l.)
l.) dunal
l.) merr
l.) urban
l.mol(-1).s(-1
l+g (10.57+/-1.97
lady prelox®)
lambda(max
lancet 380(9840
lantus®) once daily
late september) bulls
lc-ms(n
lc(50
lc(50
lc(50-90
lc/ms/ms) method
ld(50
ldl(2
ldl(c
left ventricular (lv
lent-soma) scales
lepr q223r) genes
lepr(db/db
leptin receptor-deficient) mice
levels(53
levonorgestrel-releasing (mirena
lgals3(+/+)
libido (desire
libido (t
libido.(abstract truncated
libido.(abstract truncated
libido" (
light chain 3)-ii levels
light-cycler 480 (roche
limax pseudoflavus (
lipitor®)
lipoprotein lipase(lpl
lipoprotein(a
liquid co(2
liviel(r
lkb1(s
ln(ki-67
local health unit of caserta (southern italy
local.(abstract
local.(abstract
locally advanced/advanced) disease
log(cac
log(tg/hdl-c
low dose(5
low s(i
low- (
low-density lipoprotein (
low-density lipoprotein (ldl
low-glycemic index) diet
low-quality evidence).• coasting
low)/cd34(+)/vegfr2(+
lowering hemoglobin a(1c
lp(a
lp(a).(abstract truncated
lp[a
lsl-k-ras(g12d/+)/pten(floxp/floxp
lsl-kras(g12d
lss-derived auc(0,24
lt-nes) cell lines
lvdp/dt(min
lw-amide(s
m-echo signal)
m(-
m(2)[chronic kidney disease
m(r
m(w
m(η
m)-insulin
m2[p 
ma(m
manganese(iii
marketscan®)
mastectomy(ssm
mate1(+
mcal de kg(-1
mean auc(0-t
mean haemoglobin a(1c
mean hba(1c
mean hemoglobin a(1c
mean(±sd
mean(sd
measure sexual cognition/fantasy (desire
mechanism(s
mechanisms(s
menopause-specific quality of life) questionnaire
mercury(ii
mesenteric et(a
met (metformin
met (t1
met-p 33.8(5.2
met(-
met(+
metabolic syndrome (mefisto)(8
metformin (shiguibao
metformin 5 x 10(-4
metformin auc(0-∞
metformin c(max
metformin t(max
metformin-d(6
metformin-nickel(ii
metformin-treated (m
metformin,(500
metformin(1-(diaminomethylidene)-3,3-dimethyl-guanidine
metformin(50 microm
metformin(p 
metformin) trial
metformin/tacrolimus (met/t
meth(2)(++)•2dca(-
methyl cellulose) sedentary
methylglyoxal-bis(guanylhydrazone
methylglyoxyl bis(guanylhydrazone
mg)/ee
mg)/levonorgestrel
mg/(100 g•d
mg/(kg body weight
mg/(kg day
mg/1000 mg)-glimepiride
mg/kg(2
mg/kg)-induced
mg/kg)+nifedipine
mg/kg)+rapa
mg2+)
mgso(4
microg)-cyproterone acetate
mitrocoma(halistaura
ml.min(-1
ml).comparing
ml/kg)-treated
mlmin(-1
mm(-2)s(-2
mm(mri
mm)-stimulated
mmol l(-)(1
mmol/mol)-10
moderate-quality).inconclusivelifestyle
modified ferriman-gallwey (mf-g
modified intent-to-treat) populations
molecular interaction(s
molecular mechanism(s
molecule(s
molecule(s
mondia whitei hook (skeels
months)--efficacy
mpp(+
mrna(1.18+/-0.06
ms(-
ms(+
ms(+)tds(-
ms(+)tds(+
msf-4 item) questionnaire
mt-trna(trp
mtb) elimination
mtor(ser-2481
mtor)-dependent ones
mtor1) signaling
multidrug resistance(mdr
multivariate) regression modelling
myoinositol(600 
n-(hydra zinocarbonyl)amino]-4-amino-3,6-disulfonato-1,8-naphthalimid e)
n-[4-(1,1,1,3,3,3-hexafluoro-2-hydroxypropan-2-yl)phenyl]-n-(2,2,2-trifluoroethyl)benzenesulfonamide
n-dimethylcarbamimidoyl)guanidine
n-trimethyl-2-[methyl(7-nitrobenzo[c][1,2,5]oxadiazol-4-yl)amino]ethanaminium
n-trimethyl-2-[methyl(7-nitrobenzo[c][1,2,5]oxadiazol-4-yl)amino]ethanaminium iodide
n-trimethyl-2-[methyl(7-nitrobenzo[c][l,2,5]oxadiazol-4-yl)amino]ethanaminium iodide
n(ω
na(+))
na(2)co(3
na+-k+-atpase (
nad(+
nadh/nad(+
national institute of health (clinicaltrials.gov
national institutes of health (nih
natural anagen-inducing signal(s
nct00451399(study 1
ne768(frua
netherlands (boxmeer
neurological (n
new vanadium(iv
nfκb(p65
nh(2
ni(ii
nijmegen (netherlands
nine-day syncro-mate-b((r)
no. chictr-iir-16007901)
no/cgmp) pathway
nocturnal penile erection(χ²=29.815
nominal p=0.0084)
non insulin-dependent) diabetes
non-caucasian population.(abstract truncated
non-caucasian population.(abstract truncated
non-diabetic (control
non-high-density lipoprotein (
non-ovlon (ethinylestradiol
non-steady-state [3
non-use (
normo-glycemic condition(group 3
normoxia-conditioned (cn
novo nordisk inc) therapy
novolog(®
novorapid®)
nuclear factor kappa-b(nf-kb
nuclear factor kappa-b(nf-κb
nuclear factor-kappa b(nfkb
o-p (χ(2) 
o-tetradecanoylphorbol-13-acetate (tpa
o.d.)+glimepiride
o(.-)(2
o(2
o(2).(-
o1(+)/mbp(+
o2(∙-
oad(s
oads(metformin
obtained.(abstract truncated
ocular) status
odds ratio[or
ogtt-derived auc(ins/gluc
oha(s
oil (o
old(er
onoo(-
option(s
or[95%ci
or[95%ci]=5.63[0.42-76
oral 14c-glucose)
oral antidiabetic agent(s
organoiridium(iii
oros(r
ovid) databases
oxaliplatin)-based chemotherapy
oxido-vanadium(iv
p-akt(ser473
p-irs-1(tyr895
p-methoxyformylbenzene-5-(1-phenyl-3-methyl-4-nitropyrazolyl
p(app
p(f
p(i(max
p(interaction
p(lifestyle*snp
p<0.001).the
p<0.001)and
p<0.02)and se-selectin
p<0.05),with
p=0.62).no lactic acidosis
p16(ink4a
p21(waf/cip
p21(waf1
p21(waf1/cip1
p27(kip1
p38) signaling
p53(-
p53(+/+)
p53+)
p70(s6k
p70s6k thr(389
pai1).there
pampk(ser173
pampkα(thr172
pancreatic cancer(pac
papp-a-generated n-terminal (
parathyroid hormone(pth
parental-(snu-c5
patient's hemoglobin a(1c
pausene(r
pbh(+
pcdna3.1(+
pce/(pce 
pcos-vas1(facial hair
pcos(pre-metformin
pd-1(pdcd1
pd10[dose
pdx1(+
peak e(2
peak vo(2
pennsylvania)
period.(abstract truncated
period.(abstract truncated
peritoneal dialysis(pd
pg)
pge(2
pgf(2alpha
pgf(2alpha
phase) dysfunction
phenformin.(abstract truncated
phenyl)-2-propenoic acid
phenyl)-2-propenoic acid
phi(b
pi-3,4,5-(po(4))(3
pi3 kinase)/protein kinase b
pi3k)
pi3k) inhibitors
pi3k) pathway
pi3k) pathways
pim(s
pimephales promelas)
pio+pc)
pip(3
pk/pharmacodynamic (pd
pka) signaling
pkc-ζ(t410a
placebo-subtracted hba(1c
plantago ovata f.)
plasma glucagon-like peptide-1(glp-1
plutonium(iv
pmol/(min
po(2
pocl(3
point.(abstract truncated
point.(abstract truncated
poly (i
poly(3-hexylthiophene
poly(acrylic acid
poly(ethylene glycol)-block-poly(propylene glycol)-block-poly(ethylene glycol
poly(lactic acid
poly(lactic-co-glycolic acid
poly(magnesium acrylate
poly(methyl methacrylate
poly(styrene-alt-maleic anhydride
poly(thioetheramido acid)-poly(ethylene glycol
poly(vinyl pyrrolidone
possible.(abstract
possible.(abstract
post-antide.(abstract truncated
post-antide.(abstract truncated
postprandial glucose(ppg
ppar(peroxisome proliferator
pparγ +/-)
pparγ(2
prandin®)
prasterone®)
prcre/+ ptenflox/+)
premixed insulin lispro 25)
process.(abstract truncated
process.(abstract truncated
prolactin(hprl
prolonged t(max
prostaglandin e(1
prostaglandin e(2
prostaglandin el1(pge1
prostaglandin f(2alpha
prostaglandin f(2α
prostaglandin f2 alpha tham salt(pgf
prostaglandin f2 alpha tham salt(pgf
protein kinase cα) activity
protein(s
prous science integrity(r
pseudo-erotic) strivings
psychopathy checklist-revised (
pten(+/-
pubmed(r
q-q graphics).to
q(max
q)
q10 (coq10
qjm(h
qtern®)
quality.(abstract truncated
quality.(abstract truncated
quality(ahrq
r(10h
r(85
r(d
r(g
r(hf
r460x (nt7605c
r61c (rs12208357
rajasthan (india
randomized clinical trials(rcts
rapamycin)-mediated
rcp(11;21)(q28;q12
reactive oxygen species(ros
reactive oxygen species)/redox balance
reason(s
receptors(er
reciprocal ldl(1
reduced hemoglobin a(1c
regression trees) procedure
renal transporter(s
reply.(abstract truncated
respectively)--inhibited
respectively).(abstract truncated
respectively).in conclusion
respiratory o(2
retinoic acid 10(-6
retrospective) trials
retrospectively registered) isrctn75758249
rho(0
rio-t2d)
riomet®)
role(s
role(s
ros)-resistant
rosa26(r899x
rr=1.01[95
rti-4587-073(l
s-3-(4-nitrophenoxy
s.d. 2.9)%
s(6
s(g
s(i
s)-nicotine
s)/(k(m
saigon-dongnai river) vietnam
saliva orthana(®
sanyinjiao"(sp 6
satisfactory sexual event" (
savor-timi 53 (
saxenda®)
sc(otf)(3
schizophrenia outpatient health outcome) study
science(tm
scr (crea
second-line glucose-lowering medication(s
secondary)
secondary)
secondary) diabetes
ser(235/236
ser(307
ser(473
ser(789
ser(79
serotonin(2c-
sert(-/-
sert(+/-
serum alpha-1-fetoprotein (
serum e(2
serum high-density lipoprotein (
serum high-molecular-weight (hmw
serum insulin (p=.044
serum levels of testosterone(p<0.01
serum microrna-29 (mir-29
serum vitamin b(12
serum vitamin b(12
sex-matched rentgf-β1 (tg
sexual dysfunction(s
sexual dysfunction(s
sexual inhibition\sexual
sexual life(χ²=21.211
sexual life(χ²=70.445
sexual) desire
sft(max
sh2b1 (rs7498665
si/al (p=0.042
sicile (italy
signal(s
single- (
sirt1)-forkhead box protein o1
sirtuin 1(sirt1
site.[3
slc47a)
slc5a2) inhibitors
slc5a5) protein
sm22-tsc1(-/-
small high-density lipoprotein (
smd-1.05[-2.13,0.03
so4(2-
sodium n-[8-(2-hydroxybenzoyl)amino]caprylate
sodium-[1
sodium-glucose co-transporter-2 (
software qikprop(®
sphygmocor (version 7.1
src (
stage iii) disease
stage m1) disease
star(*)d study
stata 14 (
stop-niddm (study
streptozotocin- (stz-
subjective (
sulfamethoxazole/trimethoprim (smz/tmp
sweet tooth) domains
swt(k/r
symptoms checklist-90-revised (scl-90-r
system(pmrs
t x a)
t-allele (tt+tg
t-test(s
t(1/2
t(20
t(3
t(3
t(90
t(lmet
t(max
t(y;21)(p11;q11
t)
t2(mm
tbc1d4 thr(642)/ser(711
tc(r
tdt-mediated dutp nick-end labeling) assay
tensin homolog) protein
teratogenesis.(abstract truncated
tesavel(r
testosterone (t
testosterone (t
testosterone(t
tg(triglyceride
tg(wt1b
tg),high density lipoprotein
tgf-beta(1
tgf-β(1
thiazolidinedione derivatives(tzds
thiazolidinediones(tzd
thiazoyl)-2,5-diphenyl-2h-tetrazolium bromide
thin layer radiochromatography(tlrc
thr(172
tidal volume ≈7 ml/kg)
time- (
total (t
total.(abstract truncated
tpp(+
tr4(-/-
training dexamethasone-treated (
trans-4-{4-[3-(4-trifluoromethoxyphenyl)-ureido]cyclohexyloxy}benzoic acid
transcription factor(tf
transforming growth factor- (tgf-
transforming growth factor-β) signaling
transplantation patients(tx
trieste (italy
trigonella foenum-graecum l.) seed mucilage
tris(2,2'-bipyridyl)ruthenium(ii
trna(leu
tuberous sclerosis 1)/tuberin
tumour necrosis factor-alpha)
type 2) diabetes mellitus
type ii)
tyr(354
u)/k(i
uflc-ms/ms (
uge(0
ultrasound) documenting
united states)-a fixed combination
united states[1
uvrier canal (switzerland
v(c(15)h(10)i(2)n(2)o(2))(ch(3)o)o(ch(3)oh
v(d
v(u
v/f(apparent volume of distribution
val(8))glp-1(glupal
val8-glp-1(7
varthemia iphionoides boiss (compositae
vascular k(atp
vasculogenesis(a process
ve/vco(2
vector-reaction-diffusion-drift (
viagra(r
vipdomet®)
viral dsrna analogue poly(i
vit b(12
vitamin b(12
vitamin d(3
vo(2
vo(2max
w. volubilis(50
w(d
w(peak
w\o
waist)
waist)
weekly intramuscular (im
weekly intramuscular (im
weibull) models
weight) diet
weighted mean glucose auc(0
western blot)
with[1
women.(abstract truncated
women.(abstract truncated
world health organization) methodology
x(1
x(6β-ohf
xp-v) gene
zinc-α2-glycoprotein (
zncl(2
zucker lean (zl
α (
α and β)
α(1
α(4)β(2
α1/2(-/-
β-[1-(14)c]hydroxybutyric acid
δauc(gluc60
δg(∗
δi(30)/δg(30
φ(s
DeNeutoy commented 4 years ago

Hmm, thanks @iacopy! Most of these look like tokenization errors, leading to misclassification. Some of them also look like reasonable entities to me also. If you can consistently recognise an issue with the tokenization, you can add exceptions to the spacy tokenizer, or re-tokenize after the fact to fix them.

dakinggg commented 4 years ago

Yeah, I remember I had some code in the tokenizer to deal with parentheses a bit better, but at some point spacy changed from the regex package to the re package, and that code required variable width lookbehinds, which re does not support, so it was commented out. Not sure thats the entirety of the problem, but given how many of these have unbalanced parens, i think it is part of it.