GrokCV / GrokSAR

Apache License 2.0
6 stars 0 forks source link

Problems On training MSAR datasets #3

Open 2000YWQ opened 4 days ago

2000YWQ commented 4 days ago

When I try to train on MSAR dataset I have this problem ,but I don't have this problem when I train on other datasets, do I need to do something extra with the dataset? Can you provide me some suggestions?Thank you very much Traceback (most recent call last): File "tools/train_det.py", line 134, in <module> main() File "tools/train_det.py", line 130, in main runner.train() File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1745, in train model = self.train_loop.run() # type: ignore File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run self.run_epoch() File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch self.run_iter(idx, data_batch) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter outputs = self.runner.model.train_step( File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step losses = self._run_forward(data, mode='loss') # type: ignore File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 340, in _run_forward results = self(**data, mode=mode) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 92, in forward return self.loss(inputs, data_samples) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/mmdet/models/detectors/single_stage.py", line 77, in loss x = self.extract_feat(batch_inputs) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/mmdet/models/detectors/single_stage.py", line 148, in extract_feat x = self.neck(x) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/workspace/test/GrokSAR/groksar/models/necks/FrequencySpatialFPN.py", line 649, in forward laterals[i - 1] = self.DCTDenoAttention[i - 1](laterals[i - 1]) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/workspace/test/GrokSAR/groksar/models/necks/FrequencySpatialFPN.py", line 23, in forward y = self.dct_x(x) File "/opt/conda/envs/MSFA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/workspace/test/GrokSAR/groksar/models/necks/FrequencySpatialFPN.py", line 96, in forward dct_component = x * weight.view(1, 1, 1, x.shape[3]).expand_as(x) RuntimeError: shape '[1, 1, 1, 16]' is invalid for input of size 32

travelerwood commented 3 days ago

Thank you for feedback, the fixed image and tensor scale is neassary for 2D DCT transformation, and the image size in the MSAR dataset is 256 * 256, which is different from the image sizes of other datasets. In this version, the code is by default set for the SAR-AIRcraft and AIR-SARShip datasets, and we have commented out the code that is suitable for the MSAR dataset. In this case, you can uncomment the following section in our code for training MSAR dataset:

groksar/models/necks/FrequencySpatialFPN.py

lines 578~613

        # This is for MSAR dataset
        """
        self.DCTDenoAttention0 = nn.Sequential(
            DCT2DSpatialTransformLayer(32, 32),
            # SElayer(256,16),
            # GroupAttentionlayer(32 * 32, 16),
            # SelectGroupFClayer(32*32),
            # SpatialFCAttentionlayer(32*32, 16),
            IDCT2DSpatialTransformLayer(32, 32)
        )
        self.DCTDenoAttention1 = nn.Sequential(
            DCT2DSpatialTransformLayer(16, 16),
            # SElayer(256,16),
            # GroupAttentionlayer(16 * 16, 16),
            # SelectGroupFClayer(16 * 16),
            # SpatialFCAttentionlayer(16*16, 16),
            IDCT2DSpatialTransformLayer(16, 16),
        )
        """

        # This is for SAR-AIRcraft and AIR-SARShip dataset
        self.DCTDenoAttention0 = nn.Sequential(
            DCT2DSpatialTransformLayer(64, 64),
            # SElayer(256,16),
            # GroupAttentionlayer(64 * 64, 32),
            SelectGroupFClayer(64*64),
            # SpatialFCAttentionlayer(64*64, 16),
            IDCT2DSpatialTransformLayer(64, 64)
        )
        self.DCTDenoAttention1 = nn.Sequential(
            DCT2DSpatialTransformLayer(32, 32),
            # SElayer(256,16),
            # GroupAttentionlayer(32 * 32, 32),
            SelectGroupFClayer(32 * 32),
            # SpatialFCAttentionlayer(32*32, 16),
            IDCT2DSpatialTransformLayer(32, 32)
        )

should be rewriten as

        # This is for MSAR dataset
        self.DCTDenoAttention0 = nn.Sequential(
            DCT2DSpatialTransformLayer(32, 32),
            # SElayer(256,16),
            # GroupAttentionlayer(32 * 32, 16),
            SelectGroupFClayer(32*32),
            # SpatialFCAttentionlayer(32*32, 16),
            IDCT2DSpatialTransformLayer(32, 32)
        )
        self.DCTDenoAttention1 = nn.Sequential(
            DCT2DSpatialTransformLayer(16, 16),
            # SElayer(256,16),
            # GroupAttentionlayer(16 * 16, 16),
            SelectGroupFClayer(16 * 16),
            # SpatialFCAttentionlayer(16*16, 16),
            IDCT2DSpatialTransformLayer(16, 16),
        )

        # This is for SAR-AIRcraft and AIR-SARShip dataset
        """
        self.DCTDenoAttention0 = nn.Sequential(
            DCT2DSpatialTransformLayer(64, 64),
            # SElayer(256,16),
            # GroupAttentionlayer(64 * 64, 32),
            SelectGroupFClayer(64*64),
            # SpatialFCAttentionlayer(64*64, 16),
            IDCT2DSpatialTransformLayer(64, 64)
        )
        self.DCTDenoAttention1 = nn.Sequential(
            DCT2DSpatialTransformLayer(32, 32),
            # SElayer(256,16),
            # GroupAttentionlayer(32 * 32, 32),
            SelectGroupFClayer(32 * 32),
            # SpatialFCAttentionlayer(32*32, 16),
            IDCT2DSpatialTransformLayer(32, 32)
        )
      """

For convenience, we will implement the code to automatically apply the image size based on the dataset by parameter passing in the next version.