explosion / spaCy

πŸ’« Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.62k stars 4.35k forks source link

πŸ’« spaCy v2.0.0 alpha – details, feedback & questions (plus stickers!) #1105

Closed ines closed 6 years ago

ines commented 7 years ago

We're very excited to finally publish the first alpha pre-release of spaCy v2.0. It's still an early release and (obviously) not intended for production use. You might come across a NotImplementedError – see the release notes for the implementation details that are still missing.

This thread is intended for general discussion, feedback and all questions related to v2.0. If you come across more complex bugs, feel free to open a separate issue.

Quickstart & overview

The most important new features

Installation

spaCy v2.0.0-alpha is available on pip as spacy-nightly. If you want to test the new version, we recommend setting up a clean environment first. To install the new model, you'll have to download it with its full name, using the --direct flag.

pip install spacy-nightly
python -m spacy download en_core_web_sm-2.0.0-alpha --direct   # English
python -m spacy download xx_ent_wiki_sm-2.0.0-alpha --direct   # Multi-language NER
import spacy
nlp = spacy.load('en_core_web_sm')
import en_core_web_sm
nlp = en_core_web_sm.load()

Alpha models for German, French and Spanish are coming soon!

Now on to the fun part – stickers!

stickers

We just got our first delivery of spaCy stickers and want to to share them with you! There's only one small favour we'd like to ask. The part we're currently behind on are the tests – this includes our test suite as well as in-depth testing of the new features and usage examples. So here's the idea:

Submit a PR with your test to the develop branch – if the test covers a bug and currently fails, mark it with @pytest.mark.xfail. For more info, see the test suite docs. Once your pull request is accepted, send us your address via email or private message on Gitter and we'll mail you stickers.

If you can't find anything, don't have time or can't be bothered, that's fine too. Posting your feedback on spaCy v2.0 here counts as well. To be honest, we really just want to mail out stickers πŸ˜‰

jesushd12 commented 6 years ago

Hi @nathanathan were you able to resolve the problem? I'm getting the same problem with similarity function, I'm using spanish model.

ines commented 6 years ago

@nathanathan @jesushd12 Sorry about that – we're still finalising the vector support on the current models (see #1457). We're currently training a new family of models for the next version, which includes a lot of fixes and updates currently on develop. (Unless there are new bugs or significant problems, this is likely also going to be the version we're promoting to the release candidate πŸŽ‰ )

chaturv3di commented 6 years ago

I'm trying to install spaCy 2.0 alpha in a new conda environment, and I'm receiving undefined symbol: PyFPE_jbuf error. Afaik, this is due to two versions of numpy. However, I have made sure that my packages numpy, scipy, msgpack-numpy, and Cython are all installed solely via pip. In fact, I even tried the flavour where all of these packages are installed solely via conda. No luck.

Would anyone be able to offer any advise?

honnibal commented 6 years ago

@chaturv3di That error tends to occur when pip uses a cached binary package. I find this happens a lot for me with the cytoolz package --- somehow its metadata is incorrect and pip thinks it can be compatible across both Python 2 and 3.

Try pip uninstall cytoolz && pip install cytoolz --no-cache-dir

chaturv3di commented 6 years ago

Thanks @honnibal.

For the record, after following your advise, I received the same error but this time from the preshed package. I did the same with it, i.e. pip uninstall preshed && pip install preshed --no-cache-dir. And it worked.

chaturv3di commented 6 years ago

Hi All,

This is related to dependency parsing. Where can I find the exact logic for merging Spans when the "merge phrases" option is chosen on https://demos.explosion.ai?

Thanks in advance.

ines commented 6 years ago

@chaturv3di See here in the spacy-services:

if collapse_phrases:
    for np in list(self.doc.noun_chunks):
        np.merge(np.root.tag_, np.root.lemma_, np.root.ent_type_)

Essentially, all you need to do is iterate over the noun phrases in doc.noun_chunks, merge them and make sure to re-assign the tags and labels. In spaCy v2.0, you can also specify the arguments as keyword arguments, e.g. span.merge(tag=tag, lemma=lemma, ent_type=ent_type).

ines commented 6 years ago

Thanks everyone for your feedback! πŸ’™ spaCy v2.0 is now live: https://github.com/explosion/spaCy/releases/tag/v2.0.0

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.