Open vinayak-sharan opened 1 year ago
Okay I got some info regarding this and making some changes to the code, as you said the output difference is high in using fp16 and fp32. FP16 precision for conversion compared to FP32 suggests that the model is highly sensitive to numerical precision. So simply we have to debug the code once and some changes to grid samples.
Meanwhile can you share any other details regarding this
Loading untrusted .pt
files is a security risk. Please share code which reproduces this issue but does not need .pt
files. Maybe you could set a seed (torch.manual_seed
) then call torch.randn
you get the tensor you need. Or just hard code the tensor values if they aren't large.
I made some fix to the code, https://github.com/Suraj209211/GitHUB-E1.git. Repository to the fix code in the folder. And let me check on this info you provided for the code. I have decoded the .pt file for grid and features and they are a bit larger file. do u have any other solution to it??
Loading untrusted
.pt
files is a security risk. Please share code which reproduces this issue but does not need.pt
files. Maybe you could set a seed (torch.manual_seed
) then calltorch.randn
you get the tensor you need. Or just hard code the tensor values if they aren't large.
If the Tensors are too large to hard code, you could try to reproduce this issue with randomly generated tensors. First, set a random seed (torch.manual_seed
). Then call torch.randn
to get the two tensors you need.
Gotcha. Let me fix the code according to you
Tobby I have fix the code but I have encounter with an error for the torch version I am using that is 2.1.1 that the most recent version
I have degrade the version of torch to run the code , Then also it showinmg me some kinda error. link to the code file is here : https://github.com/Suraj209211/GitHUB-E1/blob/90fdac40204e61257d2d47c7f5b74ea30393e492/BugFIx/coreMLBUG.py#L6 @TobyRoseman let me know where I go wrong..
@TobyRoseman have you went through the code??
I got an Approach for the same I will be trying it out
I have not gone through the code. Before we can proceed here, we need code to reproduce the grid_sample
issue (that doesn't involve loading .pt
files).
What error are you getting now? If this error is unrelated to coremltools, I'm probably not going to be able to help you.
I am getting an error for the Torch version for CoreML
So need code to first to produce the grid value from the .pt files. Then we will be using the feature.pt for the same
@TobyRoseman is this the correct variation, I achieved for the .pt files
@TobyRoseman if it's fine then lets check the other variation to it. Need a green signal for the bug fix
Hello @TobyRoseman I hope you seen the output I have updated here
@Suraj209211 - I don't understand your last few messages.
In order to make progress, we need step to reproduce this issue which does not involved loading .pt
files.
Okay then I got the issues to be solved now. i have to jot down the step to reproduce doesn't involved the .pt files
Okay then I got the issues to be solved now. i have to jot down the step to reproduce doesn't involved the .pt files
I have figure some step using torch onnx
@TobyRoseman on reference to your message I have figure out to reproduce issue without '.pt' files.
Link: https://github.com/Suraj209211/GitHUB-E1/blob/main/BugFIx/test.ipynb
In order to make progress, we need step to reproduce this issue which does not involved loading
.pt
files.
What next approach I must follow according to you
Hey @TobyRoseman hope u r fine, I did not find any other comments from you regarding this problem,
Is there any other thing left for this, or the precision for the above code is correct?
Hi @TobyRoseman, I understand your concerns, therefore I have updated the steps to reproduce the issue. My apologies for the delayed response.
I have converted the .pt files to .txt files and with the help of numpy.
Since loading untrusted .pt files poses a security risk, I am sharing the .txt files instead. The file 'feat.txt' exceeds 75 MB and therefore couldn't be uploaded here. To avoid any suspicion of viruses, I am uploading them to GitHub. The links are provided below. Please download the files and save them in the appropriate directory.
@Suraj209211 - the code in your notebook doesn't look right. You're not passing the PyTorch and Core ML model the same input. So of course the output will be different.
@vinayak-sharan - thanks for updating the original code and including the data as text files. I can run your code without error. Here is the output I get:
Difference between pytorch's grid sample before and after conversion: Note: Pytorch is fp32 and coreML is fp16 : 5.699944813386537e-05
Difference between pytorch's grid sample before and after conversion: Note: Pytorch is fp32 and coreML is fp32 : 5.699944813386537e-05
Relative change in the difference: 0.0
This seems well within the range of acceptable differences. Are you seeing significantly larger differences?
@Suraj209211 - the code in your notebook doesn't look right. You're not passing the PyTorch and Core ML model the same input. So of course the output will be different.
@vinayak-sharan - thanks for updating the original code and including the data as text files. I can run your code without error. Here is the output I get:
Difference between pytorch's grid sample before and after conversion: Note: Pytorch is fp32 and coreML is fp16 : 5.699944813386537e-05 Difference between pytorch's grid sample before and after conversion: Note: Pytorch is fp32 and coreML is fp32 : 5.699944813386537e-05 Relative change in the difference: 0.0
This seems well within the range of acceptable differences. Are you seeing significantly larger differences?
thank you for correcting me @TobyRoseman Is this bug opened or solved??
@TobyRoseman for original code I am getting the output value as this. I think I may have updated you the wrong code for mistakenly in the GitHub link that I am using for the test. Sorry for the inconvenience.
The Output:
(https://github.com/Suraj209211/Apple/blob/main/fix2.ipynb) this is the correct link to the code for which I am getting the output
🐞Describing the bug
The CoreML model, when converted from a PyTorch model using grid sampling, shows a large deviation in output values compared to the original PyTorch model.
The output difference is notably high. When using FP16 precision for conversion instead of the default FP32, the relative change in output difference is approximately 131.59, or 13159%. This points towards the issue in the conversion process or compatibility between PyTorch's grid sample implementation and CoreML's .
Here is the screen shot attached for the output of below code:
Code To Reproduce
System environment (please complete the following information):
Files required for the above code.
Note: Since loading untrusted .pt files poses a security risk, I am sharing the .txt files instead. The file 'feat.txt' exceeds 75 MB and therefore couldn't be uploaded here. To avoid any suspicion of viruses, I am uploading them to GitHub. The links are provided below. Please download the files and save them in the appropriate directory. @TobyRoseman