benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

Implicit Library issues : IndexError: index 360503 is out of bounds for axis 0 with size 82354 #641

Closed ritesh1187 closed 1 year ago

ritesh1187 commented 1 year ago

@benfred : Have we made some more changes to the function? I was able to score (with the above changes) till 15th Dec'22. The following code was executed :

sparse_item_user = sparse.csr_matrix((df_model['cnt_clk'].astype(float), (df_model['content_id'], df_model['sub_id'])))
sparse_user_item = sparse.csr_matrix((df_model['cnt_clk'].astype(float), (df_model['sub_id'], df_model['content_id'])))

sparse_user_item
<4504602x82354 sparse matrix of type '<class 'numpy.float64'>'
    with 50534476 stored elements in Compressed Sparse Row format>

sparse_item_user
<82354x4504602 sparse matrix of type '<class 'numpy.float64'>'
    with 50534476 stored elements in Compressed Sparse Row format>
if __name__ == '__main__':
    content = []
    scores = []
    subs = []
    def performRecommendations(i):
        recommended = model.recommend(subids_test[i], sparse_user_item[i], 10)

        subs.append(subids_test[i])
        for (c, s) in zip(recommended[0], recommended[1]):
            content.append(c)
            scores.append(s)
            #subs_1.append(subids_test[i])
            #print("c:",c, "s:", s)

##Above block of code is for scoring

    def doRecommendations(number, start):
        newList = [performRecommendations(i) for i in range(start, number)]

##Below block of code is for parallel processing 

    th_count = 62
    th_list = []
    begintime=time.time()

    for th in range(1, th_count+1):
        q, mod = divmod(len(subids_test), th_count)
        number = th * q
        start = number - q
        if mod != 0 and th == th_count:
            number = number + mod
        thread = Thread(target=doRecommendations, args=(number, start,))
        th_list.append(thread)
    for th in th_list:
        th.start()
    for th in th_list:
        th.join()
    endtime = time.time()
    print("The total time taken:",endtime-begintime)

Now when I am trying to score again, I am getting the following error:

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-54-0958b4b3cf6d>", line 17, in doRecommendations
    newList = [performRecommendations(i) for i in range(start, number)]
  File "<ipython-input-54-0958b4b3cf6d>", line 17, in <listcomp>
    newList = [performRecommendations(i) for i in range(start, number)]
  File "<ipython-input-54-0958b4b3cf6d>", line 6, in performRecommendations
    recommended = model.recommend(subids_test[i], sparse_user_item[i], 10)
  File "implicit/recommender_base.pyx", line 171, in implicit.recommender_base.MatrixFactorizationBase.recommend
  File "implicit/recommender_base.pyx", line 321, in implicit.recommender_base.MatrixFactorizationBase._user_factor
IndexError: index 360503 is out of bounds for axis 0 with size 82354
benfred commented 1 year ago

How did you fit this model?

The model.fit API changed in v0.5.0 with this change https://github.com/benfred/implicit/pull/484 . It now takes a user_items matrix instead of a item_users matrix - and I'm guessing this is tripping you up.

benfred commented 1 year ago

Closing - feel free to re-open if this isn't resolved