cognibit / Text-Normalization-Demo

Demonstration of the results in "Text Normalization using Memory Augmented Neural Networks", Authors: Subhojeet Pramanik, Aman Hussain
http://arxiv.org/abs/1806.00044
Apache License 2.0
60 stars 7 forks source link

links in setup.sh not working #1

Open tahaceritli opened 5 years ago

tahaceritli commented 5 years ago

Hi,

The following links are not working: https://storage.googleapis.com/ainstein_text_normalization/test_data.zip https://storage.googleapis.com/ainstein_text_normalization/dnc_model.zip

When I run setup.sh, I'm getting this output: Downloading and extracting required files --2019-07-02 15:41:52-- https://storage.googleapis.com/ainstein_text_normalization/test_data.zip Resolving storage.googleapis.com... 216.58.210.240, 2a00:1450:4009:80f::2010 Connecting to storage.googleapis.com|216.58.210.240|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2019-07-02 15:41:52 ERROR 404: Not Found.

--2019-07-02 15:41:52-- https://storage.googleapis.com/ainstein_text_normalization/dnc_model.zip Resolving storage.googleapis.com... 216.58.210.240, 2a00:1450:4009:80f::2010 Connecting to storage.googleapis.com|216.58.210.240|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2019-07-02 15:41:52 ERROR 404: Not Found.

unzip: cannot find or open test_data.zip, test_data.zip.zip or test_data.zip.ZIP. unzip: cannot find or open dnc_model.zip, dnc_model.zip.zip or dnc_model.zip.ZIP. rm: test_data.zip: No such file or directory rm: dnc_model.zip: No such file or directory Finished

Could you update this?

Thanks, Taha

subho406 commented 5 years ago

Hi, Thanks for opening this issue. The model urls were moved and have now been fixed in the latest commit. Please, check and confirm.

tahaceritli commented 5 years ago

Thanks for the reply. I'm able to download them now. But I think there's a mismatch with the model version, which prevents me from reproducing the notebook: This occurs when I run the Text Normalization Demo notebook at the following line: raw_data['class'] = xgb.predict(data=raw_data)

Processed 100%

AttributeError Traceback (most recent call last)

in 1 # Class of tokens in the data ----> 2 raw_data['class'] = xgb.predict(data=raw_data) 3 # Raw to Classified Data 4 classified_data = raw_data.copy(deep=False) Text-Normalization-Demo/src/XGBclassify.py in predict(self, data) 69 70 # classify as RemainSelf or ToBeNormalized ---> 71 y = self.model.predict(X) 72 y_labels = [self.labels[int(i)] for i in y] 73 return y_labels ~/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in predict(self, data, output_margin, ntree_limit, validate_features) 783 prediction : numpy array 784 """ --> 785 test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs) 786 if ntree_limit is None: 787 ntree_limit = getattr(self, "best_ntree_limit", 0) AttributeError: 'XGBClassifier' object has no attribute 'n_jobs' To confirm this, I tried the following code too, which gives the same error: import sys sys.path.append("../src") from XGBclassify import XGB xgb_path = '../models/english/en_xgb_tuned-trained.pk' xgb = XGB(xgb_path) print(xgb.model.n_jobs) AttributeError Traceback (most recent call last) in ----> 1 print(xgb.model.n_jobs) AttributeError: 'XGBClassifier' object has no attribute 'n_jobs' Thanks,
subho406 commented 5 years ago

Are you using the provided deep-tf conda environment during running the notebook? Generally, the parameters n_jobs not found is related to installed version of the xgboost library.

Thanks for the reply. I'm able to download them now. But I think there's a mismatch with the model version, which prevents me from reproducing the notebook: This occurs when I run the Text Normalization Demo notebook at the following line: raw_data['class'] = xgb.predict(data=raw_data)

Processed 100%

AttributeError Traceback (most recent call last) in 1 # Class of tokens in the data ----> 2 raw_data['class'] = xgb.predict(data=raw_data) 3 # Raw to Classified Data 4 classified_data = raw_data.copy(deep=False)

~/Workspace/git/github/aida-repos/pnormalizatiton/notebooks/unit-normalization/Text-Normalization-Demo/src/XGBclassify.py in predict(self, data) 69 70 # classify as RemainSelf or ToBeNormalized ---> 71 y = self.model.predict(X) 72 y_labels = [self.labels[int(i)] for i in y] 73 return y_labels

~/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in predict(self, data, output_margin, ntree_limit, validate_features) 783 prediction : numpy array 784 """ --> 785 test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs) 786 if ntree_limit is None: 787 ntree_limit = getattr(self, "best_ntree_limit", 0)

AttributeError: 'XGBClassifier' object has no attribute 'n_jobs'

To confirm this, I tried the following code too, which gives the same error: import sys sys.path.append("../src")

from XGBclassify import XGB

xgb_path = '../models/english/en_xgb_tuned-trained.pk' xgb = XGB(xgb_path)

print(xgb.model.n_jobs)

AttributeError Traceback (most recent call last) in ----> 1 print(xgb.model.n_jobs)

AttributeError: 'XGBClassifier' object has no attribute 'n_jobs'

Thanks,

tahaceritli commented 5 years ago

Yes, and according to the source code I have, n_jobs should exist. When I create a new object of that class, n_jobs can be reached with a default value. But I don't think the model you've uploaded, which is extracted to '../models/english/en_xgb_tuned-trained.pk', contains this parameter.