Closed dependabot-preview[bot] closed 6 years ago
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
Bumps gensim from 3.4.0 to 3.6.0.
Release notes
*Sourced from [gensim's releases](https://github.com/RaRe-Technologies/gensim/releases).* > ## 3.6.0, 2018-09-20 > > ### :star2: New features > * File-based training for `*2Vec` models (__[[**persiyanov**](https://github.com/persiyanov)](https://github.com/persiyanov)__, [#2127](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2127) & [#2078](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2078) & [#2048](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2048)) > > [Blog post / Jupyter tutorial](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Any2Vec_Filebased.ipynb). > > New training mode for `*2Vec` models (word2vec, doc2vec, fasttext) that allows model training to **scale linearly with the number of cores** (full GIL elimination). The result of our Google Summer of Code 2018 project by Dmitry Persiyanov. > > **Benchmark** on the full English Wikipedia, Intel(R) Xeon(R) CPU @ 2.30GHz 32 cores (GCE cloud), MKL BLAS: > > | Model | Queue-based version [sec] | File-based version [sec] | speed up | Accuracy (queue-based) | Accuracy (file-based) | > |-------|------------|--------------------|----------|----------------|-----------------------| > | Word2Vec | 9230 | **2437** | **3.79x** | 0.754 (± 0.003) | 0.750 (± 0.001) | > | Doc2Vec | 18264 | **2889** | **6.32x** | 0.721 (± 0.002) | 0.683 (± 0.003) | > | FastText | 16361 | **10625** | **1.54x** | 0.642 (± 0.002) | 0.660 (± 0.001) | > > Usage: > > ```python > import gensim.downloader as api > from multiprocessing import cpu_count > from gensim.utils import save_as_line_sentence > from gensim.test.utils import get_tmpfile > from gensim.models import Word2Vec, Doc2Vec, FastText > > > # Convert any corpus to the needed format: 1 document per line, words delimited by " " > corpus = api.load("text8") > corpus_fname = get_tmpfile("text8-file-sentence.txt") > save_as_line_sentence(corpus, corpus_fname) > > # Choose num of cores that you want to use (let's use all, models scale linearly now!) > num_cores = cpu_count() > > # Train models using all cores > w2v_model = Word2Vec(corpus_file=corpus_fname, workers=num_cores) > d2v_model = Doc2Vec(corpus_file=corpus_fname, workers=num_cores) > ft_model = FastText(corpus_file=corpus_fname, workers=num_cores) > > ``` > [Read notebook tutorial with full description.](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Any2Vec_Filebased.ipynb) > > > ### :+1: Improvements > > * Add scikit-learn wrapper for `FastText` (__[[**mcemilg**](https://github.com/mcemilg)](https://github.com/mcemilg)__, [#2178](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2178)) > * Add multiprocessing support for `BM25` (__[[**Shiki-H**](https://github.com/Shiki-H)](https://github.com/Shiki-H)__, [#2146](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2146)) > * Add `name_only` option for downloader api (__[[**aneesh-joshi**](https://github.com/aneesh-joshi)](https://github.com/aneesh-joshi)__, [#2143](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2143)) > * Make `word2vec2tensor` script compatible with `python3` (__[[**vsocrates**](https://github.com/vsocrates)](https://github.com/vsocrates)__, [#2147](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2147)) > ... (truncated)Changelog
*Sourced from [gensim's changelog](https://github.com/RaRe-Technologies/gensim/blob/develop/CHANGELOG.md).* > ## 3.6.0, 2018-09-20 > > ### :star2: New features > * File-based training for `*2Vec` models (__[[**persiyanov**](https://github.com/persiyanov)](https://github.com/persiyanov)__, [#2127](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2127) & [#2078](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2078) & [#2048](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2048)) > > New training mode for `*2Vec` models (word2vec, doc2vec, fasttext) that allows model training to scale linearly with the number of cores (full GIL elimination). The result of our Google Summer of Code 2018 project by Dmitry Persiyanov. > > **Benchmark** > - Dataset: `full English Wikipedia` > - Cloud: `GCE` > - CPU: `Intel(R) Xeon(R) CPU @ 2.30GHz 32 cores` > - BLAS: `MKL` > > > | Model | Queue-based version [sec] | File-based version [sec] | speed up | Accuracy (queue-based) | Accuracy (file-based) | > |-------|------------|--------------------|----------|----------------|-----------------------| > | Word2Vec | 9230 | **2437** | **3.79x** | 0.754 (± 0.003) | 0.750 (± 0.001) | > | Doc2Vec | 18264 | **2889** | **6.32x** | 0.721 (± 0.002) | 0.683 (± 0.003) | > | FastText | 16361 | **10625** | **1.54x** | 0.642 (± 0.002) | 0.660 (± 0.001) | > > Usage: > > ```python > import gensim.downloader as api > from multiprocessing import cpu_count > from gensim.utils import save_as_line_sentence > from gensim.test.utils import get_tmpfile > from gensim.models import Word2Vec, Doc2Vec, FastText > > > # Convert any corpus to the needed format: 1 document per line, words delimited by " " > corpus = api.load("text8") > corpus_fname = get_tmpfile("text8-file-sentence.txt") > save_as_line_sentence(corpus, corpus_fname) > > # Choose num of cores that you want to use (let's use all, models scale linearly now!) > num_cores = cpu_count() > > # Train models using all cores > w2v_model = Word2Vec(corpus_file=corpus_fname, workers=num_cores) > d2v_model = Doc2Vec(corpus_file=corpus_fname, workers=num_cores) > ft_model = FastText(corpus_file=corpus_fname, workers=num_cores) > > ``` > [Read notebook tutorial with full description.](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Any2Vec_Filebased.ipynb) > > > ### :+1: Improvements > > * Add scikit-learn wrapper for `FastText` (__[[**mcemilg**](https://github.com/mcemilg)](https://github.com/mcemilg)__, [#2178](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/pull/2178)) > ... (truncated)Commits
- [`355ecc6`](https://github.com/RaRe-Technologies/gensim/commit/355ecc68a6ccb07f38418e8c80784b70aac84442) Merge branch 'release-3.6.0' - [`35d1b5b`](https://github.com/RaRe-Technologies/gensim/commit/35d1b5bc62e8bb3cd9d54159e0be2e561f60790e) regenerated C files with Cython - [`e22419e`](https://github.com/RaRe-Technologies/gensim/commit/e22419e9f86e671ea59e7fd54a4a5007429bae4a) bump CHANGELOG to 3.6.0 - [`5164f0f`](https://github.com/RaRe-Technologies/gensim/commit/5164f0f20910780b8cd7c97dd3d2560034ea2a9d) bump version to 3.6.0 - [`97783a4`](https://github.com/RaRe-Technologies/gensim/commit/97783a40aa1d00ca7942b8ab483fd65a2075f8b6) Add scikit-learn wrapper for `FastText` ([#2178](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/issues/2178)) - [`4224879`](https://github.com/RaRe-Technologies/gensim/commit/422487966bd94acf24ed48edbeef72f39b28a6e0) Fix formula in Mallet documentation ([#2186](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/issues/2186)) - [`3c3506d`](https://github.com/RaRe-Technologies/gensim/commit/3c3506d51a2caf6b890de3b1b32a8b85f7566ca5) File-based fast training for Any2Vec models ([#2127](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/issues/2127)) - [`e87aa85`](https://github.com/RaRe-Technologies/gensim/commit/e87aa850972a8d578f36b7e2e9a793d0fe40d5e7) Replace deprecated parameters with new in docstring of `gensim.models.Doc2Vec... - [`3ccbb2e`](https://github.com/RaRe-Technologies/gensim/commit/3ccbb2e406cb65de25a53182718e19fc770ce8e9) Fix quote of vocabulary from `gensim.models.Word2Vec` ([#2161](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/issues/2161)) - [`7fedf5a`](https://github.com/RaRe-Technologies/gensim/commit/7fedf5addd3f75bcd2e1ab6ded89aa677611534d) Use heading instead of bold style in `gensim.models.translation_matrix` ([#2164](https://github-redirect.dependabot.com/RaRe-Technologies/gensim/issues/2164)) - Additional commits viewable in [compare view](https://github.com/RaRe-Technologies/gensim/compare/3.4.0...3.6.0)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Note: This repo was added to Dependabot recently, so you'll receive a maximum of 5 PRs for your first few update runs. Once an update run creates fewer than 5 PRs we'll remove that limit.
You can always request more updates by clicking
Bump now
in your Dependabot dashboard.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot ignore this [patch|minor|major] version` will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language - `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme Additionally, you can set the following in your Dependabot [dashboard](https://app.dependabot.com): - Update frequency (including time of day and day of week) - Automerge options (never/patch/minor, and dev/runtime dependencies) - Pull request limits (per update run and/or open at any time) - Out-of-range updates (receive only lockfile updates, if desired) - Security updates (receive only security updates, if desired) Finally, you can contact us by mentioning @dependabot.