cvlab-stonybrook / Scanpath_Prediction

Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning (CVPR2020)
MIT License
97 stars 22 forks source link

Index out of range #7

Closed ManooshSamiei closed 3 years ago

ManooshSamiei commented 3 years ago

Hello, I am wondering if it is possible for your to explain the functions that covert position to action and action to position in utils.py file. I found that there might be some bugs in position to action function as it produces numbers greater than 640 for the action, and I encounter the error: "index 1762 is out of bounds for dimension 1 with size 640" I also do not understand why you devide each position element (x and y) by the image width/heigh in "position to action" function. I would really appreciate any guidance from you.

Thank you so much for your time.

bests, Manoosh

ManooshSamiei commented 3 years ago

I was able to fix this issue by scaling the fixation positions: x and y, from the LCD size to 512x320. I did this by using the following code:

       label = pos_to_action(traj['X'][i]*(512/1680), traj['Y'][i]*(320/1050), patch_size,
                              patch_num) 

where I multiply x positions by 512/1680 and y positions by 320/1050.

However, I still dont understand why there is one part in the code where the fixation position is converted to action, and then is converted back to position but with an 8 pixel (patch_size/2) shift in both x and y direction. This is the part I do not understand:

    label = pos_to_action(traj['X'][0], traj['Y'][0], patch_size,
                          patch_num)
    tar_x, tar_y = action_to_pos(label, patch_size, patch_num)
    fixs = [(tar_x, tar_y)]

what are tar_x and tar_y? and why they are 8 pixels different from the original values for x and y (i.e traj['X'][0], traj['Y'][0])? Why action_to_pos function adds 8 pixels (patch_size/2) to the original position values?

ManooshSamiei commented 3 years ago

I also do not understand why you do not consider the last fixation's positions, in the 'fix_labels' list:

        fix_label = (traj['name'], traj['task'], copy(fixs), label)
        # discretize fixations
        tar_x, tar_y = action_to_pos(label, patch_size, patch_num)
        fixs.append((tar_x, tar_y))

        fix_labels.append(fix_label)
return fix_labels

Based on the above code, 'fixs' list is changed after its included in fix_label list. Appending a new value to 'fixs' list is not effective here; because what is returned by the function is 'fix_labels' which does not contian the new values of the 'fixs' list.

I really appreciate it if you could help me understand the code better. Thank you very much for your time.

ouyangzhibo commented 3 years ago

I was able to fix this issue by scaling the fixation positions: x and y, from the LCD size to 512x320. I did this by using the following code:

       label = pos_to_action(traj['X'][i]*(512/1680), traj['Y'][i]*(320/1050), patch_size,
                              patch_num) 

where I multiply x positions by 512/1680 and y positions by 320/1050.

However, I still dont understand why there is one part in the code where the fixation position is converted to action, and then is converted back to position but with an 8 pixel (patch_size/2) shift in both x and y direction. This is the part I do not understand:

    label = pos_to_action(traj['X'][0], traj['Y'][0], patch_size,
                          patch_num)
    tar_x, tar_y = action_to_pos(label, patch_size, patch_num)
    fixs = [(tar_x, tar_y)]

what are tar_x and tar_y? and why they are 8 pixels different from the original values for x and y (i.e traj['X'][0], traj['Y'][0])? Why action_to_pos function adds 8 pixels (patch_size/2) to the original position values?

Note that in our setting, the image space is discretized into a 20x32 grid with each grid representing 16x16 pixels. Each grid is considered as an action. The reason we add 8 pixels when mapping from action space to image space is basically mapping to the center of that selected grid.

ouyangzhibo commented 3 years ago

I also do not understand why you do not consider the last fixation's positions, in the 'fix_labels' list:

        fix_label = (traj['name'], traj['task'], copy(fixs), label)
        # discretize fixations
        tar_x, tar_y = action_to_pos(label, patch_size, patch_num)
        fixs.append((tar_x, tar_y))

        fix_labels.append(fix_label)
return fix_labels

Based on the above code, 'fixs' list is changed after its included in fix_label list. Appending a new value to 'fixs' list is not effective here; because what is returned by the function is 'fix_labels' which does not contian the new values of the 'fixs' list.

I really appreciate it if you could help me understand the code better. Thank you very much for your time.

Notice that the data structure of each training data point is formatted as (all previous fixations, next fixation/action). So the last item in the "fix_label" list is (1 to N-1 fixations, N-th fixation), assuming N fixations in total. Since we use "fixs" here as "all previous fixations", so we did not include the last fixation.

Hope this helps!

ouyangzhibo commented 3 years ago

if you have no other questions, I am closing this issue, if you have other questions pls reopen it or open a new issue. Thanks!

ManooshSamiei commented 3 years ago

Thank you very much Zhibo. That was really helpful

quangdaist01 commented 2 years ago

(512/1680), traj['Y'][i](320/1050)

Hello, Can you explain why to have to scale the trajs by those 2 values to make the code work? I put that in the code too but I am still getting the IndexError: image image

Thank you for your time!