ericleasemorgan / reader

Distant Reader, a tool for using & understanding a corpus
GNU General Public License v2.0
20 stars 7 forks source link

Added multiple models to txt2ent.py from scispacy and streamlined pro… #75

Closed archaeocharlie closed 4 years ago

archaeocharlie commented 4 years ago

This has three scispacy models now in addition to the normal spacy. I changed spacy's model to lg, which has slightly better F1 for NER. Some approaches to cleaning this data need to be explored next.

ericleasemorgan commented 4 years ago

I'm still learning how to use Git; please send me some sample output. --ELM

archaeocharlie commented 4 years ago

Here's an example from one file.

Charlie Harper, PhD Digital Scholarship Specialist Freedman Center for Digital Scholarship Kelvin Smith Library Case Western Reserve University (216)-368-4253 <(216)%20368-4253> | crh92@case.edu

On Thu, Jun 4, 2020 at 3:04 PM Eric Lease Morgan notifications@github.com wrote:

I'm still learning how to use Git; please send me some sample output. --ELM

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ericleasemorgan/reader/pull/75#issuecomment-639057945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETW5WH46YM55IQS6HMWHUTRU7V4RANCNFSM4NS36XMA .

ericleasemorgan commented 4 years ago

No file came through. :-(

archaeocharlie commented 4 years ago

output.zip

archaeocharlie commented 4 years ago

I guess it doesn't like .tsv!

ericleasemorgan commented 4 years ago

'Looks good, and you have already committed. Correct?

Here are two tweaks. First, at the top of the file, date and sign your good work; add your name to the file. Second, create a configuration at the top the file looking something like this:

MODELS = ['en_core_web_lg', 'en_ner_craft_md', 'en_ner_jnlpba_md', 'en_ner_bc5cdr_md']

And then lower in the code do:

for model in MODELS:

This way it easier to see where to add additional models when needed.

Okay?

archaeocharlie commented 4 years ago

Will recommit and do a pull request/merge now!

Charlie Harper, PhD Digital Scholarship Specialist Freedman Center for Digital Scholarship Kelvin Smith Library Case Western Reserve University (216)-368-4253 <(216)%20368-4253> | crh92@case.edu

On Thu, Jun 4, 2020 at 3:16 PM Eric Lease Morgan notifications@github.com wrote:

'Looks good, and you have already committed. Correct?

Here are two tweaks. First, at the top of the file, date and sign your good work; add your name to the file. Second, create a configuration at the top the file looking something like this:

MODELS = ['en_core_web_lg', 'en_ner_craft_md', 'en_ner_jnlpba_md', 'en_ner_bc5cdr_md']

And then lower in the code do:

for model in MODELS:

This way it easier to see where to add additional models when needed.

Okay?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ericleasemorgan/reader/pull/75#issuecomment-639063411, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETW5WED55EI4NWLD6WDIYLRU7XJHANCNFSM4NS36XMA .