Closed CYXYZ closed 8 months ago
Hi @CYXYZ , in the version of DiffPose
on the refactor-se3
branch, there shouldn't be any calls to import pytorch3d
When I search for it, I don't see it in the code: https://github.com/search?q=repo%3Aeigenvivek%2FDiffPose%20pytorch3d&type=code
My advice would be
environment.yml
and activate itdiffdrr
from source (git clone https://github.com/eigenvivek/DiffDRR.git; cd DiffDRR; pip install -e .
)diffpose
on the refactor-se3
branch (cd DiffPose; pip install -e .
)When using the latest version, I've been able to train all models without crashing. However, it's entirely possible that there's some other bug in the geodesic distance code that produces a NaN during training. Just because I haven't seen it yet doesn't mean it's not real!
Please try retraining with a clean environment and let me know if the crashing persists
Dear vivek, something wrong when I run train.py on the refactor-se3 branch:
Traceback (most recent call last):
File "/home/data/cyx/autodl-tmp/DiffPose-refactor-se3/experiments/deepfluoro/train.py", line 236, in
import torch from beartype import beartype from diffdrr.utils import convert from jaxtyping import Float, jaxtyped from pytorch3d.transforms import ( so3_rotation_angle, so3_relative_angle, standardize_quaternion, ) from typing import Optional
from beartype import beartype from diffdrr.utils import convert as convert_so3 from jaxtyping import Float, jaxtyped from pytorch3d.transforms import Transform3d from pytorchse3.se3 import se3_exp_map, se3_log_map
@beartype class RigidTransform(Transform3d): """Wrapper of pytorch3d.transforms.Transform3d with extra functionalities."""
@jaxtyped
def __init__(
self,
R: Float[torch.Tensor, "..."],
t: Float[torch.Tensor, "... 3"],
parameterization: str = "matrix",
convention: Optional[str] = None,
device=None,
dtype=torch.float32,
):
if device is None and (R.device == t.device):
device = R.device
R = convert_so3(R, parameterization, "matrix", convention)
if R.dim() == 2 and t.dim() == 1:
R = R.unsqueeze(0)
t = t.unsqueeze(0)
assert (batch_size := len(R)) == len(t), "R and t need same batch size"
matrix = torch.zeros(batch_size, 4, 4, device=device, dtype=dtype)
matrix[..., :3, :3] = R.transpose(-1, -2)
matrix[..., 3, :3] = t
matrix[..., 3, 3] = 1
super().__init__(matrix=matrix, device=device, dtype=dtype)
def get_rotation(self, parameterization=None, convention=None):
R = self.get_matrix()[..., :3, :3].transpose(-1, -2)
if parameterization is not None:
R = convert_so3(R, "matrix", parameterization, None, convention)
return R
def get_translation(self):
return self.get_matrix()[..., 3, :3]
def inverse(self):
"""Closed-form inverse for rigid transforms."""
R = self.get_rotation().transpose(-1, -2)
t = self.get_translation()
t = -torch.einsum("bij,bj->bi", R, t)
return RigidTransform(R, t, device=self.device, dtype=self.dtype)
def compose(self, other):
T = super().compose(other)
R = T.get_matrix()[..., :3, :3].transpose(-1, -2)
t = T.get_matrix()[..., 3, :3]
return RigidTransform(R, t, device=self.device, dtype=self.dtype)
def clone(self):
R = self.get_matrix()[..., :3, :3].transpose(-1, -2).clone()
t = self.get_matrix()[..., 3, :3].clone()
return RigidTransform(R, t, device=self.device, dtype=self.dtype)
def get_se3_log(self):
return se3_log_map(self.get_matrix().transpose(-1, -2))
class GeodesicSE3(torch.nn.Module): """Calculate the distance between transforms in the log-space of SE(3)."""
def __init__(self):
super().__init__()
@beartype
@jaxtyped
def forward(
self,
pose_1: RigidTransform,
pose_2: RigidTransform,
) -> Float[torch.Tensor, "b"]:
return pose_2.compose(pose_1.inverse()).get_se3_log().norm(dim=1)
pose = torch.tensor([[[ 1.8144e-01, 9.8316e-01, -2.1685e-02, 0.0000e+00], [ 2.5295e-01, -6.7969e-02, -9.6509e-01, 0.0000e+00], [-9.5031e-01, 1.6962e-01, -2.6102e-01, 0.0000e+00], [ 2.2374e+02, 3.8585e+02, 2.2170e+02, 1.0000e+00]],
[[ 1.7733e-01, 9.8249e-01, 5.7100e-02, 0.0000e+00],
[-6.8154e-02, 7.0139e-02, -9.9521e-01, 0.0000e+00],
[-9.8179e-01, 1.7258e-01, 7.9398e-02, 0.0000e+00],
[ 2.0946e+02, 3.2825e+02, 2.1824e+02, 1.0000e+00]],
[[-6.4248e-02, 9.9234e-01, 1.0555e-01, 0.0000e+00],
[ 1.7417e-02, 1.0687e-01, -9.9412e-01, 0.0000e+00],
[-9.9778e-01, -6.2032e-02, -2.4150e-02, 0.0000e+00],
[ 1.6308e+02, 3.7611e+02, 2.2729e+02, 1.0000e+00]],
[[ 2.2041e-01, 8.7520e-01, -4.3063e-01, 0.0000e+00],
[-6.2944e-02, -4.2780e-01, -9.0168e-01, 0.0000e+00],
[-9.7337e-01, 2.2585e-01, -3.9205e-02, 0.0000e+00],
[ 3.0673e+02, 3.1338e+02, 5.1707e+01, 1.0000e+00]]], device='cuda:0')
pred_pose = torch.tensor(
[[[ 1.8104e-01, 9.8323e-01, -2.1787e-02, 0.0000e+00],
[ 2.4225e-01, -6.6052e-02, -9.6796e-01, 0.0000e+00],
[-9.5317e-01, 1.6996e-01, -2.5014e-01, 0.0000e+00],
[ 2.2539e+02, 3.8712e+02, 2.2106e+02, 1.0000e+00]],
[[ 1.7060e-01, 9.8356e-01, 5.9136e-02, 0.0000e+00],
[-7.7165e-02, 7.3167e-02, -9.9433e-01, 0.0000e+00],
[-9.8231e-01, 1.6507e-01, 8.8379e-02, 0.0000e+00],
[ 2.0866e+02, 3.3049e+02, 2.1791e+02, 1.0000e+00]],
[[-6.3945e-02, 9.9236e-01, 1.0550e-01, 0.0000e+00],
[ 1.7204e-02, 1.0680e-01, -9.9413e-01, 0.0000e+00],
[-9.9781e-01, -6.1755e-02, -2.3902e-02, 0.0000e+00],
[ 1.6372e+02, 3.7662e+02, 2.2650e+02, 1.0000e+00]],
[[ 1.9835e-01, 8.7822e-01, -4.3518e-01, 0.0000e+00],
[-5.8063e-02, -4.3270e-01, -8.9967e-01, 0.0000e+00],
[-9.7841e-01, 2.0372e-01, -3.4833e-02, 0.0000e+00],
[ 3.0021e+02, 3.1163e+02, 5.1693e+01, 1.0000e+00]]], device='cuda:0')
pose = torch.tensor(pose, device='cuda:0').clone().detach() pred_pose = torch.tensor(pred_pose, device='cuda:0').clone().detach()
pose = RigidTransform(R=pose[..., :3, :3], t=pose[..., :3, 3]) pred_pose = RigidTransform(R=pred_pose[..., :3, :3], t=pred_pose[..., :3, 3])
geodesic_calculator = GeodesicSE3()
geodesic = geodesic_calculator(pose, pred_pose)
print(geodesic)
I used the code above to test erroneous data and found that NaN values still occur. tensor([0.0120, 0.0122, nan, 0.0235], device='cuda:0')
The complete test code is displayed here.
import torch
from beartype import beartype
from diffdrr.utils import convert
from jaxtyping import Float, jaxtyped
from pytorch3d.transforms import (
so3_rotation_angle,
so3_relative_angle,
standardize_quaternion,
)
from typing import Optional
from beartype import beartype
from diffdrr.utils import convert as convert_so3
from jaxtyping import Float, jaxtyped
from pytorch3d.transforms import Transform3d
from pytorchse3.se3 import se3_exp_map, se3_log_map
@beartype
class RigidTransform(Transform3d):
"""Wrapper of pytorch3d.transforms.Transform3d with extra functionalities."""
@jaxtyped
def __init__(
self,
R: Float[torch.Tensor, "..."],
t: Float[torch.Tensor, "... 3"],
parameterization: str = "matrix",
convention: Optional[str] = None,
device=None,
dtype=torch.float32,
):
if device is None and (R.device == t.device):
device = R.device
R = convert_so3(R, parameterization, "matrix", convention)
if R.dim() == 2 and t.dim() == 1:
R = R.unsqueeze(0)
t = t.unsqueeze(0)
assert (batch_size := len(R)) == len(t), "R and t need same batch size"
matrix = torch.zeros(batch_size, 4, 4, device=device, dtype=dtype)
matrix[..., :3, :3] = R.transpose(-1, -2)
matrix[..., 3, :3] = t
matrix[..., 3, 3] = 1
super().__init__(matrix=matrix, device=device, dtype=dtype)
def get_rotation(self, parameterization=None, convention=None):
R = self.get_matrix()[..., :3, :3].transpose(-1, -2)
if parameterization is not None:
R = convert_so3(R, "matrix", parameterization, None, convention)
return R
def get_translation(self):
return self.get_matrix()[..., 3, :3]
def inverse(self):
"""Closed-form inverse for rigid transforms."""
R = self.get_rotation().transpose(-1, -2)
t = self.get_translation()
t = -torch.einsum("bij,bj->bi", R, t)
return RigidTransform(R, t, device=self.device, dtype=self.dtype)
def compose(self, other):
T = super().compose(other)
R = T.get_matrix()[..., :3, :3].transpose(-1, -2)
t = T.get_matrix()[..., 3, :3]
return RigidTransform(R, t, device=self.device, dtype=self.dtype)
def clone(self):
R = self.get_matrix()[..., :3, :3].transpose(-1, -2).clone()
t = self.get_matrix()[..., 3, :3].clone()
return RigidTransform(R, t, device=self.device, dtype=self.dtype)
def get_se3_log(self):
return se3_log_map(self.get_matrix().transpose(-1, -2))
class GeodesicSE3(torch.nn.Module):
"""Calculate the distance between transforms in the log-space of SE(3)."""
def __init__(self):
super().__init__()
@beartype
@jaxtyped
def forward(
self,
pose_1: RigidTransform,
pose_2: RigidTransform,
) -> Float[torch.Tensor, "b"]:
return pose_2.compose(pose_1.inverse()).get_se3_log().norm(dim=1)
# Example pose matrices
pose = torch.tensor([[[ 1.8144e-01, 9.8316e-01, -2.1685e-02, 0.0000e+00],
[ 2.5295e-01, -6.7969e-02, -9.6509e-01, 0.0000e+00],
[-9.5031e-01, 1.6962e-01, -2.6102e-01, 0.0000e+00],
[ 2.2374e+02, 3.8585e+02, 2.2170e+02, 1.0000e+00]],
[[ 1.7733e-01, 9.8249e-01, 5.7100e-02, 0.0000e+00],
[-6.8154e-02, 7.0139e-02, -9.9521e-01, 0.0000e+00],
[-9.8179e-01, 1.7258e-01, 7.9398e-02, 0.0000e+00],
[ 2.0946e+02, 3.2825e+02, 2.1824e+02, 1.0000e+00]],
[[-6.4248e-02, 9.9234e-01, 1.0555e-01, 0.0000e+00],
[ 1.7417e-02, 1.0687e-01, -9.9412e-01, 0.0000e+00],
[-9.9778e-01, -6.2032e-02, -2.4150e-02, 0.0000e+00],
[ 1.6308e+02, 3.7611e+02, 2.2729e+02, 1.0000e+00]],
[[ 2.2041e-01, 8.7520e-01, -4.3063e-01, 0.0000e+00],
[-6.2944e-02, -4.2780e-01, -9.0168e-01, 0.0000e+00],
[-9.7337e-01, 2.2585e-01, -3.9205e-02, 0.0000e+00],
[ 3.0673e+02, 3.1338e+02, 5.1707e+01, 1.0000e+00]]], device='cuda:0')
pred_pose = torch.tensor(
[[[ 1.8104e-01, 9.8323e-01, -2.1787e-02, 0.0000e+00],
[ 2.4225e-01, -6.6052e-02, -9.6796e-01, 0.0000e+00],
[-9.5317e-01, 1.6996e-01, -2.5014e-01, 0.0000e+00],
[ 2.2539e+02, 3.8712e+02, 2.2106e+02, 1.0000e+00]],
[[ 1.7060e-01, 9.8356e-01, 5.9136e-02, 0.0000e+00],
[-7.7165e-02, 7.3167e-02, -9.9433e-01, 0.0000e+00],
[-9.8231e-01, 1.6507e-01, 8.8379e-02, 0.0000e+00],
[ 2.0866e+02, 3.3049e+02, 2.1791e+02, 1.0000e+00]],
[[-6.3945e-02, 9.9236e-01, 1.0550e-01, 0.0000e+00],
[ 1.7204e-02, 1.0680e-01, -9.9413e-01, 0.0000e+00],
[-9.9781e-01, -6.1755e-02, -2.3902e-02, 0.0000e+00],
[ 1.6372e+02, 3.7662e+02, 2.2650e+02, 1.0000e+00]],
[[ 1.9835e-01, 8.7822e-01, -4.3518e-01, 0.0000e+00],
[-5.8063e-02, -4.3270e-01, -8.9967e-01, 0.0000e+00],
[-9.7841e-01, 2.0372e-01, -3.4833e-02, 0.0000e+00],
[ 3.0021e+02, 3.1163e+02, 5.1693e+01, 1.0000e+00]]], device='cuda:0')
# Assuming you have created instances of pose matrices
pose = torch.tensor(pose, device='cuda:0').clone().detach()
pred_pose = torch.tensor(pred_pose, device='cuda:0').clone().detach()
pose = RigidTransform(R=pose[..., :3, :3], t=pose[..., :3, 3])
pred_pose = RigidTransform(R=pred_pose[..., :3, :3], t=pred_pose[..., :3, 3])
# Creating an instance of GeodesicSE3 class
geodesic_calculator = GeodesicSE3()
# Calculate geodesic distance
geodesic = geodesic_calculator(pose, pred_pose)
print(geodesic)
Hi @CYXYZ , thanks for pointing out the erroneous code. It's using an older version of the DiffDRR API. I just updated the code and pushed it to the refactor-se3
branch. Please let me know if it's still causing issues.
Dear Vivek,
I hope this letter finds you well. I am writing to seek your assistance regarding an issue I encountered while running the refactor-se3 code.
Upon executing the train.py script, I encountered the following error message:
Traceback (most recent call last):
File "/home/data/cyx/autodl-tmp/DiffPose-refactor-se3/experiments/deepfluoro/train.py", line 235, in <module>
main(id_number)
File "/home/data/cyx/autodl-tmp/DiffPose-refactor-se3/experiments/deepfluoro/train.py", line 206, in main
train(
File "/home/data/cyx/autodl-tmp/DiffPose-refactor-se3/experiments/deepfluoro/train.py", line 69, in train
img = drr(None, None, None, pose=pose, bone_attenuation_multiplier=contrast)
File "/home/data/cyx/miniconda3/envs/diffpose/lib/python3.12/site-packages/diffdrr/drr.py", line 126, in forward
source, target = self.detector(pose)
File "/home/data/cyx/miniconda3/envs/diffpose/lib/python3.12/site-packages/diffdrr/detector.py", line 104, in forward
source = pose(self.source)
TypeError: 'NoneType' object is not callable
It seems that there's an issue with the pose variable being NoneType, resulting in a TypeError when it is being called as a function.
I have reviewed the code, but I couldn't pinpoint the exact source of the problem. Could you please provide some guidance on how to resolve this issue?
Your assistance in resolving this matter would be greatly appreciated.
Thank you for your time and support.
Warm regards, cyxyz
Thanks for letting me know, ill sit down and debug the code for a few hours and figure out what I messed up
@CYXYZ , did you pull the latest version of the code on refactor-se3
?
The error message shows you're calling drr(None, None, None, pose=pose, bone_attenuation_multiplier=contrast)
, which is not in the latest version of the program.
Dear Vivek,
I hope this email finds you well. I wanted to take a moment to express my sincere gratitude for your invaluable guidance with the new code. Thanks to your expertise and support, it runs smoothly without any issues. Your insights and direction have been instrumental in ensuring its successful execution.
Looking forward to our continued collaboration and learning from you in the future.
Warm regards, cyxyz
no problem! hope using the refector-se3
branch or the main
branch + diffdrr=0.3.9
has worked properly.
@CYXYZ , did you pull the latest version of the code on
refactor-se3
?The error message shows you're calling
drr(None, None, None, pose=pose, bone_attenuation_multiplier=contrast)
, which is not in the latest version of the program.
Hello CYXYZ, Base on the NaN problems, I apply refactor-se3 branch + diffdrr=0.3.9, while there is a problem: from diffdrr.pose import RigidTransform, convert, make_matrix ModuleNotFoundError: No module named 'diffdrr.pose' It looks like the diffdrr=0.3.9 dont match branch?
Your help in resolving this issue would be highly appreciated.
Thank you very much! Kind regards, James
@CYXYZ , did you pull the latest version of the code on
refactor-se3
? The error message shows you're callingdrr(None, None, None, pose=pose, bone_attenuation_multiplier=contrast)
, which is not in the latest version of the program.Hello CYXYZ, Base on the NaN problems, I apply refactor-se3 branch + diffdrr=0.3.9, while there is a problem: from diffdrr.pose import RigidTransform, convert, make_matrix ModuleNotFoundError: No module named 'diffdrr.pose' It looks like the diffdrr=0.3.9 dont match branch?
Your help in resolving this issue would be highly appreciated.
Thank you very much! Kind regards, James
I use the refector-se3 branch + diffdrr=0.3.11. It has worked properly.
@CYXYZ , did you pull the latest version of the code on
refactor-se3
? The error message shows you're callingdrr(None, None, None, pose=pose, bone_attenuation_multiplier=contrast)
, which is not in the latest version of the program.Hello CYXYZ, Base on the NaN problems, I apply refactor-se3 branch + diffdrr=0.3.9, while there is a problem: from diffdrr.pose import RigidTransform, convert, make_matrix ModuleNotFoundError: No module named 'diffdrr.pose' It looks like the diffdrr=0.3.9 dont match branch? Your help in resolving this issue would be highly appreciated. Thank you very much! Kind regards, James
I use the refector-se3 branch + diffdrr=0.3.11. It has worked properly.
Thank you for your kind advice, while I got this error when I apply diffdrr=0.3.11: Traceback (most recent call last): File "train.py", line 11, in from diffpose.deepfluoro import DeepFluoroDataset, Transforms, get_random_offset File "/root/DS/DiffPose-refactor-se3/diffpose/deepfluoro.py", line 17, in from .calibration import perspective_projection File "/root/DS/DiffPose-refactor-se3/diffpose/calibration.py", line 17, in @jaxtyped(typechecker=beartype) TypeError: jaxtyped() got an unexpected keyword argument 'typechecker'
The python version is: 3.8. and others are: diffdrr 0.3.11 diffpose 0.0.1 /root/DS/DiffPose-refactor-se3
@CYXYZ , did you pull the latest version of the code on
refactor-se3
? The error message shows you're callingdrr(None, None, None, pose=pose, bone_attenuation_multiplier=contrast)
, which is not in the latest version of the program.Hello CYXYZ, Base on the NaN problems, I apply refactor-se3 branch + diffdrr=0.3.9, while there is a problem: from diffdrr.pose import RigidTransform, convert, make_matrix ModuleNotFoundError: No module named 'diffdrr.pose' It looks like the diffdrr=0.3.9 dont match branch? Your help in resolving this issue would be highly appreciated. Thank you very much! Kind regards, James
I use the refector-se3 branch + diffdrr=0.3.11. It has worked properly.
Thank you for your kind advice, while I got this error when I apply diffdrr=0.3.11: Traceback (most recent call last): File "train.py", line 11, in from diffpose.deepfluoro import DeepFluoroDataset, Transforms, get_random_offset File "/root/DS/DiffPose-refactor-se3/diffpose/deepfluoro.py", line 17, in from .calibration import perspective_projection File "/root/DS/DiffPose-refactor-se3/diffpose/calibration.py", line 17, in @jaxtyped(typechecker=beartype) TypeError: jaxtyped() got an unexpected keyword argument 'typechecker'
The python version is: 3.8. and others are: diffdrr 0.3.11 diffpose 0.0.1 /root/DS/DiffPose-refactor-se3 @JamesQian11 Hello, I have encountered the same problem, asking if I have found a solution to this problem
Dear Vivek,
I hope this email finds you well. I am encountering a rather peculiar issue while training the DiffPose model. Even after approximately 800 epochs, I'm still facing NaN (not a number) problems. Here's a snippet of the error message: Aaaaaaand we've crashed... tensor([0.9912, 0.9938, 0.9717, 0.9773], device='cuda:0', grad_fn=)
tensor([ nan, 3.1945, 3.8267, 5.6667], device='cuda:0',
grad_fn=)
tensor([ 0.0000, 8.2518, 9.6319, 10.1889], device='cuda:0',
grad_fn=)
tensor([2.0651, 3.1944, 3.8266, 5.6665], device='cuda:0',
grad_fn=)
tensor([ 2.0651, 8.8485, 10.3642, 11.6586], device='cuda:0',
grad_fn=)
tensor([[[ 4.3618e-02, 9.9902e-01, -6.9611e-03, 0.0000e+00],
[-1.1242e-01, -2.0154e-03, -9.9366e-01, 0.0000e+00],
[-9.9270e-01, 4.4124e-02, 1.1222e-01, 0.0000e+00],
[ 2.8978e+02, 2.8499e+02, 1.9082e+02, 1.0000e+00]],
tensor([[[ 4.3480e-02, 9.9903e-01, -6.9389e-03, 0.0000e+00], [-1.1256e-01, -2.0024e-03, -9.9364e-01, 0.0000e+00], [-9.9269e-01, 4.3985e-02, 1.1237e-01, 0.0000e+00], [ 2.8970e+02, 2.8327e+02, 1.9196e+02, 1.0000e+00]],
I have followed the instructions provided in the following link to modify the so3.py file: link.
Additionally, when attempting to utilize the content from this branch after installing the environment.yml file, I encountered an error during the execution of pip install diffpose. Here's a snippet of the error message:
Using cached diffpose-0.0.1-py3-none-any.whl.metadata (7.8 kB) ... ERROR: Could not find a version that satisfies the requirement pytorch3d (from diffpose) (from versions: none) ERROR: No matching distribution found for pytorch3d
It seems that the DiffPose package still requires the PyTorch3D code even after the modification in the so3.py file. Could you please assist me in resolving this issue?
Thank you for your attention to this matter. Best regards, cyxyz