DataTurks-Engg / Entity-Recognition-In-Resumes-SpaCy

Automatic Summarization of Resumes with NER -> Evaluate resumes at a glance through Named Entity Recognition
https://medium.com/@dataturks/automatic-summarization-of-resumes-with-ner-8b97a5f562b
443 stars 215 forks source link

ValueError: [E103] #22

Open Huzmorgoth opened 5 years ago

Huzmorgoth commented 5 years ago

I get the error mentioned below while training, even when I used the same code.

ValueError: [E103] Trying to set conflicting doc.ents: '(6861, 6870, 'Companies worked at')' and '(6305, 7258, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Abhimanyu100 commented 5 years ago

@Huzmorgoth paste this code `# trim some entity def trim_entity_spans(data: list) -> list:

invalid_span_tokens = re.compile(r'\s')
cleaned_data = []
for text, annotations in data:
    entities = annotations['entities']
    valid_entities = []
    for start, end, label in entities:
        valid_start = start
        valid_end = end
        while valid_start < len(text) and invalid_span_tokens.match(
                text[valid_start]):
            valid_start += 1
        while valid_end > 1 and invalid_span_tokens.match(
                text[valid_end - 1]):
            valid_end -= 1
        valid_entities.append([valid_start, valid_end, label])
    cleaned_data.append([text, {'entities': valid_entities}])

return cleaned_data`
Huzmorgoth commented 5 years ago

@Abhimanyu100 Hi, I tried but it's not working, the same issue occurring.


Statring iteration 0 Traceback (most recent call last):

File "", line 1, in . . . _format_docs_and_golds gold = GoldParse(doc, **gold)

File "gold.pyx", line 715, in spacy.gold.GoldParse.init

File "gold.pyx", line 925, in spacy.gold.biluo_tags_from_offsets

ValueError: [E103] Trying to set conflicting doc.ents: '(3385, 3391, 'Companies worked at')' and '(3345, 3896, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Nisit007 commented 5 years ago

i also have this error..

ValueError: [E103] Trying to set conflicting doc.ents: '(370, 392, 'Designation')' and '(370, 391, 'Designation')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Abhimanyu100 commented 5 years ago

[Edit] Which spacy version you are using? I'm able to resolve this issue.

Huzmorgoth commented 5 years ago

Python 3

Abhimanyu100 commented 5 years ago

I'm sorry. I was asking for Spacy version.

Huzmorgoth commented 5 years ago

Oh damn, it's 2.2.2

Abhimanyu100 commented 5 years ago

Use Spacy version 2.1.4 I was able to get results with this library. Let me know if this works for you.

sayalraza commented 4 years ago

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

vverman commented 4 years ago

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

Hey could you post it in your own git and share the file?

Srijha09 commented 4 years ago

I got the same error as well. ValueError: [E103] Trying to set conflicting doc.ents: '(6861, 6870, 'Companies worked at')' and '(6305, 7258, 'Skills')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap. It would be very helpful if someone can help out

JasonLing95 commented 4 years ago

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

Hi, could you share the test and train.json. Thank you

B-Yassine commented 3 years ago

I am encounteering the same problem: ValueError: [E103] Trying to set conflicting doc.ents: '(1155, 1199, 'Email Address')' and '(1143, 1240, 'Links')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

Did you guys figure out a way to resolve it?

aditya-malte commented 3 years ago

@sayalraza Can you share the stated clean dataset

udara-kw commented 3 years ago

I am using spacy 2.2.3. In the older version of spacy, there was a bug which messed up the model after loading from disk. So, I had to update spacy and when I updated, I came across this issue. Sadly, I couldn't find a workaround and had to manually remove all conflicting entities. I have both testdata.json and traindata.json with cleaned data which will not raise this error. But I am not able to attach json format here.

@sayalraza Hey, can you please share the clean dataset. Thanks in advance!

harshgeek4coder commented 3 years ago

try installing this version :

pip install spacy==2.0.18
siddharth271101 commented 3 years ago

try installing this version :

pip install spacy==2.0.18

@harshgeek4coder were you able to solve it?

gamingflexer commented 2 years ago

v3 gives new error so try for pip install spacy==2.2.4 (collab pre installed - feb 22)

Seemz246 commented 2 years ago

[E103] Trying to set conflicting doc.ents: '(402, 818, 'Skills')' and '(817, 1118, 'worked at')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap. I'm also getting the same error while training the code. Anyone, please help me to run the code also. I'm not that much familiar in machine learning

Seemz246 commented 2 years ago

spaCy version 2.3.5
Python version 3.9.10 using this version

BillelBenoudjit commented 1 year ago

I have found this code that fixes the overlapping issue.

def clean_entities(training_data):
  clean_data = []
  for text, annotation in training_data:

    entities = annotation.get('entities')
    entities_copy = entities.copy()

    # append entity only if it is longer than its overlapping entity
    i = 0
    for entity in entities_copy:
      j = 0
      for overlapping_entity in entities_copy:
        # Skip self
        if i != j:
          e_start, e_end, oe_start, oe_end = entity[0], entity[1], overlapping_entity[0], overlapping_entity[1]
          # Delete any entity that overlaps, keep if longer
          if ((e_start >= oe_start and e_start <= oe_end) \
          or (e_end <= oe_end and e_end >= oe_start)) \
          and ((e_end - e_start) <= (oe_end - oe_start)):
            entities.remove(entity)
        j += 1
      i += 1
    clean_data.append((text, {'entities': entities}))

  return clean_data