AllanNastin / ethical-ai

2 stars 2 forks source link

Select a Named Recognition Tool (NER) #9

Closed lizchow1 closed 8 months ago

lizchow1 commented 9 months ago

Selecting a Named Recognition Tool (NER): During the meeting people mentions one suck as the pytorch one (https://pypi.org/project/pytorch-ner/) or any of the ones available on hugging face. For this task select a NER tool we can use in the project, and if possible, test it out locally to see how it's used and so on.

AllanNastin commented 9 months ago

Quick gpt comparison

  1. Spacy:

    • Pros: Efficient, fast, easy-to-use, supports multiple languages, provides pre-trained models.
    • Cons: Less flexible compared to some deep learning frameworks, not as many pre-trained models as Hugging Face's Transformers.
  2. NLTK (Natural Language Toolkit):

    • Pros: Easy to use, good for teaching and prototyping.
    • Cons: NER capabilities are not as powerful or accurate as some other libraries.
  3. Stanford NER:

    • Pros: High accuracy, uses Conditional Random Field (CRF) models, supports multiple languages.
    • Cons: Implemented in Java (but can be used in Python through NLTK), slower than some other options.
  4. Flair:

    • Pros: Uses a character-level LSTM, which can capture word-internal structure, supports a large number of languages.
    • Cons: Can be slower and require more computational resources than some other options.
  5. DeepPavlov:

    • Pros: Provides a few different models for NER, built on TensorFlow and Keras.
    • Cons: Can be complex to set up and use, requires significant computational resources.
  6. Transformers (by Hugging Face):

    • Pros: Provides thousands of pre-trained models, supports a wide range of NLP tasks, easy to use.
    • Cons: Using advanced models can require significant computational resources.
  7. PyTorch:

    • Pros: Dynamic computation graphs, good Python integration, large and active community, compatible with Hugging Face's Transformers.
    • Cons: Might be more complex to use for beginners compared to some high-level libraries.

My takeaway: Transformers or PyTorch might be best as long as we have computational power. Since we run from our computers then it should be fine, or we might end up deploying it to cloud... In that case IBM should be asked about that, what resources we could be provided with

fitzgd10 commented 9 months ago

Allan, Hope this email finds you well,

I'm not sure I understand the desired response from this email. I'm taking it that you want another option on which of these would be most suited to us. Does "quick get comparison" mean you'd like me to compare the pros and cons of the listed items? Based on what I see there, I would say HuggingFace and Flair would be the best options. Seeing as the 2nd years will be dealing with the programming in the model. If you look at the capabilities doc on the discord we can see that only one 2nd year has real experience with this type of programming so the complexity of pyflair might be unnecessarily time consuming, especially when the alternatives seem, for the scope of our project, to be just as powerful.

Regards, Daniel

On Thu, 1 Feb 2024 at 21:56, AllanNastin @.***> wrote:

Quick get comparison

1.

Spacy:

  • Pros: Efficient, fast, easy-to-use, supports multiple languages, provides pre-trained models.

    • Cons: Less flexible compared to some deep learning frameworks, not as many pre-trained models as Hugging Face's Transformers. 2.

    NLTK (Natural Language Toolkit):

  • Pros: Easy to use, good for teaching and prototyping.

    • Cons: NER capabilities are not as powerful or accurate as some other libraries. 3.

    Stanford NER:

  • Pros: High accuracy, uses Conditional Random Field (CRF) models, supports multiple languages.

    • Cons: Implemented in Java (but can be used in Python through NLTK), slower than some other options. 4.

    Flair:

  • Pros: Uses a character-level LSTM, which can capture word-internal structure, supports a large number of languages.

    • Cons: Can be slower and require more computational resources than some other options. 5.

    DeepPavlov:

  • Pros: Provides a few different models for NER, built on TensorFlow and Keras.

    • Cons: Can be complex to set up and use, requires significant computational resources. 6.

    Transformers (by Hugging Face):

  • Pros: Provides thousands of pre-trained models, supports a wide range of NLP tasks, easy to use.

    • Cons: Using advanced models can require significant computational resources. 7.

    PyTorch:

  • Pros: Dynamic computation graphs, good Python integration, large and active community, compatible with Hugging Face's Transformers.
    • Cons: Might be more complex to use for beginners compared to some high-level libraries.

My takeaway: Transformers or PyTorch might be best as long as we have computational power.

— Reply to this email directly, view it on GitHub https://github.com/AllanNastin/ethical-ai/issues/9#issuecomment-1922328863, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6QJOR6P4Z5J2C4WLZ33XILYRQFSLAVCNFSM6AAAAABCVEFNK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRSGMZDQOBWGM . You are receiving this because you were assigned.Message ID: @.***>

lizchow1 commented 8 months ago

This issue is disguarded