Closed ghost closed 4 years ago
Hi there,
It seems that you have a problem with the version of smart_open
package (which is installed by default along with the gensim
pacakage). Can you go to Administration -> Code envs -> plugin_sentence-embedding_managed -> Installed packages and give me the list of all the packages and their version ?
Thank you!
Hi there,
Do you have any idea about the below error? I have downloaded the embeddings using the macro.
Job failed: Error in Python process: At line 57: <type 'exceptions.ValueError'>: Something is wrong with the pre-trained embeddings. Please make sure to either use the plugin macro to download the embeddings, or tick the custom embedding box if you are using custom vectors.
Do you have the folder containing the pretrained model as input to the recipe ? Can you send me the whole log pls ?
Please see below.
Many Thanks!
16:41:15] [INFO] [dku] running compute_hrgwehur_NP - ----------------------------------------
[16:41:15] [INFO] [dku] running compute_hrgwehur_NP - DSS startup: jek version:6.0.1
[16:41:15] [INFO] [dku] running compute_hrgwehur_NP - DSS home: /Users/bob/Library/DataScienceStudio/dss_home
[16:41:15] [INFO] [dku] running compute_hrgwehur_NP - OS: Mac OS X 10.15.2 x86_64 - Java: Oracle Corporation 1.8.0_221
[16:41:15] [INFO] [dku.flow.jobrunner] running compute_hrgwehur_NP - Allocated a slot for this activity!
[16:41:15] [INFO] [dku.flow.jobrunner] running compute_hrgwehur_NP - Run activity
[16:41:15] [INFO] [dku.flow.activity] running compute_hrgwehur_NP - Executing default pre-activity lifecycle hook
[16:41:15] [INFO] [dku.managedfolders.handler] running compute_hrgwehur_NP - Create provider for TWITTER.ZQ1worch with path /TWITTER
[16:41:15] [INFO] [dku.flow.activity] running compute_hrgwehur_NP - Checking if sources are ready
[16:41:15] [DEBUG] [dku.db.internal] running compute_hrgwehur_NP - Borrowing a connection. Read-only: false
[16:41:15] [DEBUG] [dku.db.internal] running compute_hrgwehur_NP - Created DSSDBConnection dssdb-h2-flow_state-YI1hTFz
[16:41:15] [DEBUG] [dku.dataset.hash] running compute_hrgwehur_NP - Readiness cache miss for datasetadminTWITTER.train_preparedNP
[16:41:15] [INFO] [dku.datasets.file] running compute_hrgwehur_NP - Building Filesystem handler config: {"connection":"filesystem_managed","path":"TWITTER/train_prepared","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[16:41:15] [INFO] [dku.datasets.ftplike] running compute_hrgwehur_NP - Enumerating Filesystem dataset prefix=
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumerating local filesystem prefix=/
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumeration done nb_paths=1 size=1038227
[16:41:15] [INFO] [dku.dataset.hash] running compute_hrgwehur_NP - Caching readiness for datasetadminTWITTER.train_preparedNP s=READY h=yPH5l+XYPN7R/aYM5J+EJg
[16:41:15] [INFO] [dku.flow.activity] running compute_hrgwehur_NP - Checked source readiness TWITTER.train_prepared -> true
[16:41:15] [INFO] [dku.managedfolders.handler] running compute_hrgwehur_NP - Enumerating managed folder prefix=
[16:41:15] [INFO] [dku.managedfolders.handler] running compute_hrgwehur_NP - Create provider for TWITTER.ZQ1worch with path /TWITTER
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumerating local filesystem prefix=/
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumeration done nb_paths=1 size=5025028820
[16:41:15] [INFO] [dku.flow.activity] running compute_hrgwehur_NP - Checked source readiness TWITTER.ZQ1worch -> true
[16:41:15] [DEBUG] [dku.flow.activity] running compute_hrgwehur_NP - Computing hashes to propagate BEFORE activity
[16:41:15] [DEBUG] [dku.db.internal] running compute_hrgwehur_NP - Borrowing a connection. Read-only: false
[16:41:15] [DEBUG] [dku.dataset.hash] running compute_hrgwehur_NP - Readiness cache miss for datasetadminTWITTER.train_preparedNP
[16:41:15] [INFO] [dku.datasets.file] running compute_hrgwehur_NP - Building Filesystem handler config: {"connection":"filesystem_managed","path":"TWITTER/train_prepared","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[16:41:15] [INFO] [dku.datasets.ftplike] running compute_hrgwehur_NP - Enumerating Filesystem dataset prefix=
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumerating local filesystem prefix=/
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumeration done nb_paths=1 size=1038227
[16:41:15] [INFO] [dku.dataset.hash] running compute_hrgwehur_NP - Caching readiness for datasetadminTWITTER.train_preparedNP s=READY h=yPH5l+XYPN7R/aYM5J+EJg
[16:41:15] [INFO] [dku.managedfolders.handler] running compute_hrgwehur_NP - Enumerating managed folder prefix=
[16:41:15] [INFO] [dku.managedfolders.handler] running compute_hrgwehur_NP - Create provider for TWITTER.ZQ1worch with path /TWITTER
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumerating local filesystem prefix=/
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumeration done nb_paths=1 size=5025028820
[16:41:15] [DEBUG] [dku.flow.activity] running compute_hrgwehur_NP - Recorded 2 hashes before activity run
[16:41:15] [DEBUG] [dku.flow.activity] running compute_hrgwehur_NP - Building recipe runner of type
[16:41:15] [DEBUG] [dku.job.activity] running compute_hrgwehur_NP - Filling source sizes
[16:41:15] [INFO] [dku.datasets.file] running compute_hrgwehur_NP - Building Filesystem handler config: {"connection":"filesystem_managed","path":"TWITTER/train_prepared","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[16:41:15] [INFO] [dku.datasets.ftplike] running compute_hrgwehur_NP - Enumerating Filesystem dataset prefix=
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumerating local filesystem prefix=/
[16:41:15] [DEBUG] [dku.fs.local] running compute_hrgwehur_NP - Enumeration done nb_paths=1 size=1038227
[16:41:15] [DEBUG] [dku.job.activity] running compute_hrgwehur_NP - Done filling source sizes
[16:41:15] [DEBUG] [dku.flow.activity] running compute_hrgwehur_NP - Recipe runner built, will use 1 thread(s)
[16:41:15] [DEBUG] [dku.flow.activity] running compute_hrgwehur_NP - Starting execution thread: com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner@7fa21020
[16:41:15] [DEBUG] [dku.flow.activity] running compute_hrgwehur_NP - Execution threads started, waiting for activity end
[16:41:15] [INFO] [dku.flow.activity] - Run thread for activity compute_hrgwehur_NP starting
[16:41:15] [INFO] [dku.flow.custompython] - Dumping Python script to /Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA/script.py
[16:41:15] [INFO] [dip.venv.selector] - Select in plugin with {"defaultPermission":{"admin":false},"permissions":[],"parameterSets":[],"config":{},"codeEnvName":"plugin_sentence-embedding_managed","presets":[],"gitConfig":{}}
[16:41:15] [INFO] [dku.flow.abstract.python] - Dumping Python script to /Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA/script.py
[16:41:15] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"connection":"filesystem_managed","path":"TWITTER/hrgwehur","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[16:41:15] [WARN] [dku.fs.local] - File does not exist: /Users/bob/Library/DataScienceStudio/dss_home/managed_datasets/TWITTER/hrgwehur
[16:41:15] [INFO] [dku.datasets.file] - Building Filesystem handler config: {"connection":"filesystem_managed","path":"TWITTER/train_prepared","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}}
[16:41:15] [WARN] [dku.code.projectLibs] - External libraries file not found: /Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/localconfig/projects/TWITTER/lib/external-libraries.json
[16:41:15] [INFO] [dku.code.projectLibs] - EXTERNAL LIBS FROM TWITTER is {"gitReferences":{},"pythonPath":["python"],"rsrcPath":["R"],"importLibrariesFromProjects":[]}
[16:41:15] [INFO] [dku.code.projectLibs] - chunkFolder is /Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/localconfig/projects/TWITTER/lib/R
[16:41:15] [INFO] [dip.plugin.presets] - Checking project-level settings for overriden presets and additional presets
[16:41:15] [INFO] [dip.plugin.presets] - Resolve for {"aggregation_method":"simple_average","embedding_is_custom":false,"advanced_settings":false,"smoothing_parameter":0.001,"n_principal_components":1,"text_column_names":["text"]}
[16:41:15] [INFO] [xxx] - RSRC PATH: ["/Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/localconfig/projects/TWITTER/lib/R"]
[16:41:15] [INFO] [dku.recipes.code.base] - Writing dku-exec-env for local execution in /Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA/remote-run-env-def.json
[16:41:15] [INFO] [dku.code.envs.resolution] - Executing Python activity in env: plugin_sentence-embedding_managed
[16:41:15] [INFO] [dku.flow.abstract.python] - Execute activity command: ["/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/bin/python","-u","/Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA/python-exec-wrapper.py","/Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA/script.py"]
[16:41:15] [INFO] [dku.recipes.code.base] - Run command insecurely, from user bob
[16:41:15] [INFO] [dku.security.process] - Starting process (regular)
[16:41:15] [INFO] [dku.security.process] - Process started with pid=4791
[16:41:15] [INFO] [dku.processes.cgroups] - Will use cgroups []
[16:41:15] [INFO] [dku.processes.cgroups] - Applying rules to used cgroups: []
[16:41:15] [INFO] [dku.recipes.code.base] - Process reads from nothing
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO --------------------
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO Dataiku Python entrypoint starting up
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO executable = /Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/bin/python
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO argv = ['/Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA/python-exec-wrapper.py', '/Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA/script.py']
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO --------------------
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO Looking for RemoteRunEnvDef in ./remote-run-env-def.json
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO Found RemoteRunEnvDef environment: ./remote-run-env-def.json
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO Running a DSS Python recipe locally, uinsetting env
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO Setup complete, ready to execute Python code
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO Sys path: ['/Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA', '/Users/bob/Library/DataScienceStudio/dss_home/lib/python', '/Applications/DataScienceStudio.app/Contents/Resources/kit/python', '/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python27.zip', '/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7', '/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/plat-darwin', '/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/plat-mac', '/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/lib-tk', '/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/lib-old', '/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/lib-dynload', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/site-packages', u'/Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/localconfig/projects/TWITTER/lib/python', u'/Users/bob/Library/DataScienceStudio/dss_home/plugins/installed/sentence-embedding/python-lib']
[16:41:15] [INFO] [dku.utils] - 2020-01-07 16:41:15,622 INFO Script file: /Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA/script.py
[16:41:16] [INFO] [dku.utils] - /Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/site-packages/scipy/sparse/sparsetools.py:21: DeprecationWarning: scipy.sparse.sparsetools
is deprecated!
[16:41:16] [INFO] [dku.utils] - scipy.sparse.sparsetools is a private module for scipy.sparse, and should not be used.
[16:41:16] [INFO] [dku.utils] - _deprecated()
[16:41:16] [INFO] [dku.utils] - 2020-01-07 16:41:16,505 INFO 'pattern' package not found; tag filters are not available for English
[16:41:18] [INFO] [dku.utils] - 2020-01-07 16:41:18,533 INFO Loading word embeddings from the input folder...
[16:41:18] [INFO] [dku.utils] - *** Recipe code failed **
[16:41:18] [INFO] [dku.utils] - Begin Python stack
[16:41:18] [INFO] [dku.utils] - Traceback (most recent call last):
[16:41:18] [INFO] [dku.utils] - File "/Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_hrgwehur_2020-01-07T14-41-14.950/compute_hrgwehur_NP/custom-python-recipe/pyoutVply6rdrrEVA/python-exec-wrapper.py", line 194, in
What model did you download ? Can you check that the model is indeed downloaded in the folder ?
I downloaded Glove & Word2vec having the same issue. Please check screenshots
Hm it's weird that your model is in a sub-folder. The way we do it is we check for file name to determine the pretrained model's type, so when the plugin see ZQ1Worch
, it throwback an error because that's not a legitimate model folder.
By default when running the macro, it download the model in the right place, so it is weird that it's not the case with you. You can try to rerun the macro with a new folder
Thanks removed from folder and worked!
Ok great!
Do you still have the problem with the smart_open
package ?
No the word2vec and glove works fine, I have a problem with the fast text!
is it the error in your 1st message ?
My Error sorry worked fine! On 9 Jan 2020, 11:39 +0200, Du Phan notifications@github.com, wrote:
is it the error in your 1st message ? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Good to hear, should I close this ticket ?
Yes!
Thanks! On 9 Jan 2020, 19:20 +0200, Du Phan notifications@github.com, wrote:
Good to hear, should I close this ticket ? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
great!
Hello,
Can you please help me with the below error code:
"Job failed: Error in Python process: At line 5: <type 'exceptions.ImportError'>: cannot import name open"
[2020/01/07-11:43:28.764] [null-err-72] [INFO] [dku.utils] - *** Recipe code failed ** [2020/01/07-11:43:28.765] [null-err-72] [INFO] [dku.utils] - Begin Python stack [2020/01/07-11:43:28.766] [null-err-72] [INFO] [dku.utils] - Traceback (most recent call last): [2020/01/07-11:43:28.766] [null-err-72] [INFO] [dku.utils] - File "/Users/bob/Library/DataScienceStudio/dss_home/jobs/TWITTER/Build_glove1_2020-01-07T09-43-27.559/compute_glove1_NP/custom-python-recipe/pyout4PDNSy09fJPU/python-exec-wrapper.py", line 194, in
[2020/01/07-11:43:28.767] [null-err-72] [INFO] [dku.utils] - exec(f.read())
[2020/01/07-11:43:28.767] [null-err-72] [INFO] [dku.utils] - File "", line 5, in
[2020/01/07-11:43:28.768] [null-err-72] [INFO] [dku.utils] - File "/Users/bob/Library/DataScienceStudio/dss_home/plugins/installed/sentence-embedding/python-lib/commons.py", line 4, in
[2020/01/07-11:43:28.768] [null-err-72] [INFO] [dku.utils] - from dku_language_model.context_independent_language_model import FasttextModel, Word2vecModel, GloveModel, CustomModel
[2020/01/07-11:43:28.769] [null-err-72] [INFO] [dku.utils] - File "/Users/bob/Library/DataScienceStudio/dss_home/plugins/installed/sentence-embedding/python-lib/dku_language_model/init.py", line 1, in
[2020/01/07-11:43:28.769] [null-err-72] [INFO] [dku.utils] - from dku_language_model.context_independent_language_model import FasttextModel, Word2vecModel, GloveModel
[2020/01/07-11:43:28.770] [null-err-72] [INFO] [dku.utils] - File "/Users/bob/Library/DataScienceStudio/dss_home/plugins/installed/sentence-embedding/python-lib/dku_language_model/context_independent_language_model.py", line 4, in
[2020/01/07-11:43:28.770] [null-err-72] [INFO] [dku.utils] - from gensim.models import KeyedVectors
[2020/01/07-11:43:28.771] [null-err-72] [INFO] [dku.utils] - File "/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/site-packages/gensim/init.py", line 5, in
[2020/01/07-11:43:28.771] [null-err-72] [INFO] [dku.utils] - from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils # noqa:F401
[2020/01/07-11:43:28.772] [null-err-72] [INFO] [dku.utils] - File "/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/site-packages/gensim/parsing/init.py", line 4, in
[2020/01/07-11:43:28.772] [null-err-72] [INFO] [dku.utils] - from .preprocessing import (remove_stopwords, strip_punctuation, strip_punctuation2, # noqa:F401
[2020/01/07-11:43:28.773] [null-err-72] [INFO] [dku.utils] - File "/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/site-packages/gensim/parsing/preprocessing.py", line 42, in
[2020/01/07-11:43:28.773] [null-err-72] [INFO] [dku.utils] - from gensim import utils
[2020/01/07-11:43:28.774] [null-err-72] [INFO] [dku.utils] - File "/Users/bob/Library/DataScienceStudio/dss_home/code-envs/python/plugin_sentence-embedding_managed/lib/python2.7/site-packages/gensim/utils.py", line 45, in
[2020/01/07-11:43:28.774] [null-err-72] [INFO] [dku.utils] - from smart_open import open
[2020/01/07-11:43:28.775] [null-err-72] [INFO] [dku.utils] - ImportError: cannot import name open
[2020/01/
Thanks