Open Huzmorgoth opened 5 years ago
@Huzmorgoth paste this code `# trim some entity def trim_entity_spans(data: list) -> list:
invalid_span_tokens = re.compile(r'\s')
cleaned_data = []
for text, annotations in data:
entities = annotations['entities']
valid_entities = []
for start, end, label in entities:
valid_start = start
valid_end = end
while valid_start < len(text) and invalid_span_tokens.match(
text[valid_start]):
valid_start += 1
while valid_end > 1 and invalid_span_tokens.match(
text[valid_end - 1]):
valid_end -= 1
valid_entities.append([valid_start, valid_end, label])
cleaned_data.append([text, {'entities': valid_entities}])
return cleaned_data`
@Abhimanyu100 Hi, I tried but it's not working, the same issue occurring.
Statring iteration 0 Traceback (most recent call last):
File "
File "gold.pyx", line 715, in spacy.gold.GoldParse.init
File "gold.pyx", line 925, in spacy.gold.biluo_tags_from_offsets
ValueError: [E103] Trying to set conflicting doc.ents: '(3385, 3391, 'Companies worked at')' and '(3345, 3896, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
i also have this error..
ValueError: [E103] Trying to set conflicting doc.ents: '(370, 392, 'Designation')' and '(370, 391, 'Designation')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
[Edit] Which spacy version you are using? I'm able to resolve this issue.
Python 3
I'm sorry. I was asking for Spacy version.
Oh damn, it's 2.2.2
Use Spacy version 2.1.4 I was able to get results with this library. Let me know if this works for you.
I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.
I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.
Hey could you post it in your own git and share the file?
I got the same error as well. ValueError: [E103] Trying to set conflicting doc.ents: '(6861, 6870, 'Companies worked at')' and '(6305, 7258, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap. It would be very helpful if someone can help out
I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.
Hi, could you share the test and train.json. Thank you
I am encounteering the same problem: ValueError: [E103] Trying to set conflicting doc.ents: '(1155, 1199, 'Email Address')' and '(1143, 1240, 'Links')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
Did you guys figure out a way to resolve it?
@sayalraza Can you share the stated clean dataset
I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.
@sayalraza Hey, can you please share the clean dataset. Thanks in advance!
try installing this version :
pip install spacy==2.0.18
try installing this version :
pip install spacy==2.0.18
@harshgeek4coder were you able to solve it?
v3 gives new error so try for
pip install spacy==2.2.4
(collab pre installed - feb 22)
[E103] Trying to set conflicting doc.ents: '(402, 818, 'Skills')' and '(817, 1118, 'worked at')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap. I'm also getting the same error while training the code. Anyone, please help me to run the code also. I'm not that much familiar in machine learning
spaCy version 2.3.5
Python version 3.9.10 using this version
I have found this code that fixes the overlapping issue.
def clean_entities(training_data):
clean_data = []
for text, annotation in training_data:
entities = annotation.get('entities')
entities_copy = entities.copy()
# append entity only if it is longer than its overlapping entity
i = 0
for entity in entities_copy:
j = 0
for overlapping_entity in entities_copy:
# Skip self
if i != j:
e_start, e_end, oe_start, oe_end = entity[0], entity[1], overlapping_entity[0], overlapping_entity[1]
# Delete any entity that overlaps, keep if longer
if ((e_start >= oe_start and e_start <= oe_end) \
or (e_end <= oe_end and e_end >= oe_start)) \
and ((e_end - e_start) <= (oe_end - oe_start)):
entities.remove(entity)
j += 1
i += 1
clean_data.append((text, {'entities': entities}))
return clean_data
I get the error mentioned below while training, even when I used the same code.
ValueError: [E103] Trying to set conflicting doc.ents: '(6861, 6870, 'Companies worked at')' and '(6305, 7258, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.