Closed Sandy4321 closed 6 years ago
start and stop parameters loks strange X_train.shape (7580, 1234) start 7680 stop 7580 X_train[0:2] <2x1234 sparse matrix of type '<class 'numpy.float64'>' with 179 stored elements in Compressed Sparse Row format> in this Theano call theano_train(X_train[start:stop], y_train[start:stop], sample_weight[start:stop], epoch)
good news, target formats should be only numpy so this change f.fit(X_train, y_train.values, verbosity=2, nb_epoch=20) makes code running without error for number_of_features_for_Hasher = 1234 but very slow on computer without GPU use
now code looks like
import pickle import pandas as pd from sklearn.feature_extraction import FeatureHasher
from pyfms import Classifier from sklearn.model_selection import train_test_split
number_of_features_for_Hasher = 1234 # 123#12345
if 1: from sklearn.datasets import fetch_20newsgroups twenty_train = fetch_20newsgroups(subset='train', shuffle=True, random_state=42)
filehandler = open(b"twenty_data.pkl","wb")
pickle.dump(twenty_train,filehandler)
else:
file_name = open("twenty_data.pkl",'rb')
twenty_train= pickle.load(file_name)
twenty_train.target_names
def define_label(x, target_names): name = target_names[x] return 1 if 'comp' in name else 0
target = pd.Series(twenty_train.target).apply(lambda x : define_label(x,twenty_train.target_names)) target.mean() # Prevalence of the label
q=3 def clean_text(text):
text=text.replace('\n','').replace('\t','').replace('<','').replace('>','').replace('|','')
return [x for x in text.split(' ') if len(x) > 3]
X = [clean_text(x) for x in twenty_train.data]
fh = FeatureHasher(input_type='string',n_features= number_of_features_for_Hasher, non_negative=True) X_t = fh.transform(X)
X_bin = X_t.copy() X_bin[X_bin >= 1] = 1
f = Classifier(X_t.shape[1] , k=2, X_format="csr") X_train, X_test, y_train, y_test = train_test_split(X_bin, target, test_size=0.33, random_state=42)
f.fit(X_train, y_train.values, verbosity=2, nb_epoch=20)
q=4 S_May2_changed_hashing_size_Theano_FM_sparse.zip
next step to test with full data as in example http://srome.github.io/Leveraging-Factorization-Machines-for-Sparse-Data-and-Supervised-Visualization/
interesting link http://nowave.it/factorization-machines-with-tensorflow.html
@Sandy4321, I'm closing this issue, as it sounds like you have found a solution.
Regarding the speed, please see my comment here, regarding the installation of m2w64-toolchain
.
If you encounter any future issues, please be sure to provide the simplest possible example that reproduces the issue, providing the example code in the ticket text, as opposed to attaching a zip file with code.
1 conda install m2w64-toolchain <-destroys theano 2 I uninstalled m2w64-toolchain 3 I uninstalled theano 4 installed theano again with C:\Windows\system32>conda install theano previous installation was C:\Windows\system32>pip install theano now all works thanks 5 interesting how to check if I run with GPU or with CPU
"interesting how to check if I run with GPU or with CPU"
Theano provides instructions for configuring a GPU. http://deeplearning.net/software/theano/tutorial/using_gpu.html
Theano options are configured with a .theanorc config file or a THEANO_FLAGS environment variable. Please see the link above.
from recommended link a see THEANO_FLAGS='device=cuda,floatX=float32' should I type it to some place of this file? https://github.com/dstein64/PyFactorizationMachines/blob/master/pyfms/optimizers.py
so I typed import os os.environ["THEANO_FLAGS"] = "mode=FAST_RUN,device=gpu,floatX=float32" per https://stackoverflow.com/questions/33988334/theano-config-directly-in-script but it gives error then change os.environ["THEANO_FLAGS"] = "mode=FAST_RUN,device=cuda,floatX=float32" gives error as well I use windows do you use Windows ?
@Sandy4321, I recommend you configure the environment variable before calling your program, or alternatively configure theano by using a .theanorc file. Please see the documentation here for more details. http://deeplearning.net/software/theano/library/config.html
I do not know why you're getting an error. I suspect it's an issue with how you're configuring theano, as opposed to being an issue with PyFactorizationMachines.
I see thanks
even with very modest size of data ( hashing is done for only 123 features ) code gives error code is attached in zip file and copy pasted Thanks for help File "E:\Recommender_systems\code\PyFactorizationMachines_May2\PyFactorizationMachines\S_May2_changed_hashing_size_Theano_FM_sparse.py", line 64, in
f.fit(X_train, y_train, verbosity=50, nb_epoch=200)
File "E:\Recommender_systems\code\PyFactorizationMachines_May2\PyFactorizationMachines\pyfms\models.py", line 20, in fit
X_train, y_train, error_function, optimizer, **kwargs)
File "E:\Recommender_systems\code\PyFactorizationMachines_May2\PyFactorizationMachines\pyfms\core.py", line 207, in fit
raise ArithmeticError("Non-finite loss function.")
builtins.ArithmeticError: Non-finite loss function.
code is
S_May2_changed_hashing_size_Theano_FM_sparse.py
S_May2_hashing_FM_sparse.py
S_apr25_FM_sparse
https://github.com/dstein64/PyFactorizationMachines
http://srome.github.io/Leveraging-Factorization-Machines-for-Sparse-Data-and-Supervised-Visualization/
import pickle import pandas as pd from sklearn.feature_extraction import FeatureHasher
original from PyFactorizationMachines.src.pyfm import FactorizationMachineClassifier
from pyfms import Classifier from sklearn.model_selection import train_test_split
parameters
number_of_features_for_Hasher = 123#12345
if 1: from sklearn.datasets import fetch_20newsgroups twenty_train = fetch_20newsgroups(subset='train', shuffle=True, random_state=42)
S_May2_changed_hashing_size_Theano_FM_sparse.zip
else:
twenty_train.target_names
def define_label(x, target_names): name = target_names[x] return 1 if 'comp' in name else 0
target = pd.Series(twenty_train.target).apply(lambda x : define_label(x,twenty_train.target_names)) target.mean() # Prevalence of the label
q=3 def clean_text(text):
Basic cleaning
X = [clean_text(x) for x in twenty_train.data]
Hash away!
fh = FeatureHasher(input_type='string',n_features= number_of_features_for_Hasher, non_negative=True) X_t = fh.transform(X)
Bin the inputs so that the "interaction" terms are more interpretable
X_bin = X_t.copy() X_bin[X_bin >= 1] = 1
f = Classifier(X_t.shape[1] , k=2, X_format="csr") X_train, X_test, y_train, y_test = train_test_split(X_bin, target, test_size=0.33, random_state=42)
f.fit(X_train, y_train, verbosity=50, nb_epoch=200)
q=4