Open cowjen01 opened 2 years ago
This might be because you have duplicated items in your data matrix (if this is for the same or similar data for your other issue). For embedding problems, there's no need to get multiple vectors for the same item. If you have a specific need for that then maybe you can explain what you're trying to do, and we can see if there's another way to do it.
EDIT: If you provide me your data (matrix
), I can try to help/debug.
Hello, thank you for the quick reply. The reason for the duplicates is, that I'm using PyMDE to compute user/item embeddings from the interactions matrix of the MovieLens dataset. I created a sparse matrix from the interactions and then ran the pymde.preserve_neighbors
method. So, some users have the same ratings of the same movies - typically users with a very small number of interactions). I firstly used the development version of the MovieLens dataset, but today I also tried the full version with 20M interactions, and the error is still here.
Feb 22 06:12:28 PM: Computing 10-nearest neighbors, with max_distance=None
Tue Feb 22 18:12:31 2022 Building RP forest with 15 trees
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Tue Feb 22 18:12:35 2022 metric NN descent for 13 iterations
1 / 13
2 / 13
3 / 13
4 / 13
5 / 13
6 / 13
7 / 13
Stopping threshold met -- exiting after 7 iterations
Feb 22 06:12:52 PM: Fitting a standardized embedding into R^2, for a graph with 10000 items and 147705 edges.
Feb 22 06:12:52 PM: `embed` method parameters: eps=1.0e-05, max_iter=1000, memory_size=50
Traceback (most recent call last):
File "/Users/jean/opt/miniconda3/envs/repsys/bin/repsys", line 33, in <module>
sys.exit(load_entry_point('repsys', 'console_scripts', 'repsys')())
File "/Users/jean/Documents/school/repsys/repsys/__main__.py", line 22, in main
repsys_group(prog_name="repsys")
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/Users/jean/Documents/school/repsys/repsys/cli.py", line 64, in wrapper
return func(*args, **kwargs)
File "/Users/jean/Documents/school/repsys/repsys/cli.py", line 82, in wrapper
return func(*args, **kwargs)
File "/Users/jean/Documents/school/repsys/repsys/cli.py", line 162, in dataset_eval_cmd
evaluate_dataset(dataset, split_path, output_path)
File "/Users/jean/Documents/school/repsys/repsys/core.py", line 51, in evaluate_dataset
evaluator.compute_embeddings('train')
File "/Users/jean/Documents/school/repsys/repsys/evaluators.py", line 34, in _wrapper
return func(self, *args, **kwargs)
File "/Users/jean/Documents/school/repsys/repsys/evaluators.py", line 84, in compute_embeddings
self.compute_user_embeddings(split, **kwargs)
File "/Users/jean/Documents/school/repsys/repsys/evaluators.py", line 34, in _wrapper
return func(self, *args, **kwargs)
File "/Users/jean/Documents/school/repsys/repsys/evaluators.py", line 78, in compute_user_embeddings
embeds, indexes = self._get_embeddings(matrix, **kwargs)
File "/Users/jean/Documents/school/repsys/repsys/evaluators.py", line 63, in _get_embeddings
embeddings = mde.embed(verbose=self.verbose, max_iter=1000, memory_size=50)
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/pymde/problem.py", line 508, in embed
logger=LOGGER,
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/pymde/optim.py", line 131, in lbfgs
opt.step(value_and_grad)
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/pymde/lbfgs.py", line 520, in step
obj_func, x_init, t, d, loss, flat_grad, gtd)
File "/Users/jean/opt/miniconda3/envs/repsys/lib/python3.7/site-packages/pymde/lbfgs.py", line 72, in _strong_wolfe
raise SolverError("Function evaluation returned inf.")
pymde.util.SolverError: Function evaluation returned inf.
I attached the data.zip. Here is a piece of code to quickly load them:
df = pd.read_csv('...')
n_users = df["user"].max() + 1
n_items = df["item"].max() + 1
rows, cols, values = df["user"], df["item"], df["value"]
matrix = csr_matrix(
(values, (rows, cols)),
dtype="float64",
shape=(n_users, n_items),
)
Latest implementation:
pymde.seed(0)
mde = pymde.preserve_neighbors(matrix, init='random', n_neighbors=10, constraint=pymde.Standardized(), verbose=True)
embeddings = mde.embed(verbose=True, max_iter=1000, memory_size=50)
embeddings = embeddings.cpu().numpy()
I would like to use the
Standardized()
constraint, but every time I get a following error:I tried different configurations of the
preserve_neighbors
function, but still getting the same error. After removing this constraint, everything works just fine. I also triedCentered()
constraint, which works as well.My implementation is: