drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"
MIT License
546 stars 71 forks source link

parallel_cut_pursuit repo doesn't have "improve_merge" branch #64

Closed seppestaes closed 6 months ago

seppestaes commented 7 months ago

install.sh [line 119]

https://gitlab.com/1a7r0ch3/parallel-cut-pursuit.git doesn't have a branch named "improve_merge", only "master"

switching to "master" branch, makes "setup_dependencies.py" fail at line 113 "cp_kmpp_d0_dist_cpy" not found

better1593 commented 7 months ago

Hi! I have the same problem. I encountered an issue while running install.sh: "fatal: Remote branch improve_merge not found in upstream origin." During troubleshooting, I noticed that the last command in install.sh is 'git clone -b improve_merge https://gitlab.com/1a7r0ch3/parallel-cut-pursuit.git src/dependencies/parallel_cut_pursuit.' It seems that the original author has closed the 'improve_merge' branch. Therefore, I chose to download the historical version submitted by the original author a year ago, with the SHA value of 130f54fcec0e5bbdaf133c7fde3452d9160d21a4. I am not sure if this is the correct approach, but it seems that the installation was successful in the end. However, when I tried to run 'python src/train.py experiment=s3dis datamodule.fold=5' to verify if the environment was installed successfully, the process got stuck at "processing data," and the progress remained at 0. So, I suspect that the issue may still be related to the incorrect installation of 'parallel_cut_pursuit.' Additionally, I noticed that the original author recently made updates to 'grid_graph.' I wonder if this has any impact on the compatibility of dependencies.

drprojects commented 7 months ago

Hi @seppestaes @better1593 !

Thanks for using our project 😊

As you noticed, it seems the maintainers of parallel-cut-pursuit recently updated their official repo and removed the improve_merge branch we were relying on here. We are looking into this with them to see if the branch could be restored.

Truth be told, this would simply be a temporary fix for a problem that should disappear in the near future. Indeed, I will soon (early February) release an updated version of this repo which will also support our new SuperCluster method for panoptic segmentation (arxiv paper accepted at 3DV 2024 for an Oral presentation). Among other things, this new version will have an updated install.sh that points to the master branch of parallel-cut-pursuit, so you should not see this problem then 😉

In the meantime, we will try to see if restoring the improve_merge branch of parallel-cut-pursuit is an option.

PS: if you are using this project, don't forget to give it a ⭐, it matters to us !

better1593 commented 7 months ago

Hi @seppestaes @better1593 !

Thanks for using our project 😊

As you noticed, it seems the maintainers of parallel-cut-pursuit recently updated their official repo and removed the improve_merge branch we were relying on here. We are looking into this with them to see if the branch could be restored.

Truth be told, this would simply be a temporary fix for a problem that should disappear in the near future. Indeed, I will soon (early February) release an updated version of this repo which will also support our new SuperCluster method for panoptic segmentation (arxiv paper accepted at 3DV 2024 for an Oral presentation). Among other things, this new version will have an updated install.sh that points to the master branch of parallel-cut-pursuit, so you should not see this problem then 😉

In the meantime, we will try to see if restoring the improve_merge branch of parallel-cut-pursuit is an option.

PS: if you are using this project, don't forget to give it a ⭐, it matters to us !

Thanks a lot. Looking forward to the update!

1a7r0ch3 commented 7 months ago

Hi, working on the fix right now. Sorry I did not remember that this project was still using a specific branch of parallel-cut-pursuit.

loicland commented 7 months ago

This should be fixed now thanks to @1a7r0ch3 work! @better1593 and @seppestaes can you confirm that this fixes things for you?

1a7r0ch3 commented 7 months ago

Yes, some testing is in order, I cannot do it directly on the computer I am working with right now. Follow up here if something is missing.

better1593 commented 7 months ago

This should be fixed now thanks to @1a7r0ch3 work! @better1593 and @seppestaes can you confirm that this fixes things for you?

@1a7r0ch3 @loicland Thank you so mucn! The dependencies seem to be installed successfully. However, when I try to run the command python src/train.py experiment=dales_11g, an error occurred:

Error executing job with overrides: ['experiment=dales_11g']
Traceback (most recent call last):
  File "src/train.py", line 139, in main
    metric_dict, _ = train(cfg)
  File "/root/autodl-tmp/superpoint_transformer/src/utils/utils.py", line 48, in wrap
    raise ex
  File "/root/autodl-tmp/superpoint_transformer/src/utils/utils.py", line 45, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
  File "src/train.py", line 114, in train
    trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
  File "/root/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/root/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/root/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/root/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 941, in _run
    self._data_connector.prepare_data()
  File "/root/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 94, in prepare_data
    call._call_lightning_datamodule_hook(trainer, "prepare_data")
  File "/root/miniconda3/envs/spt/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 179, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "/root/autodl-tmp/superpoint_transformer/src/datamodules/base.py", line 144, in prepare_data
    self.dataset_class(
  File "/root/autodl-tmp/superpoint_transformer/src/datasets/base.py", line 193, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "/root/miniconda3/envs/spt/lib/python3.8/site-packages/torch_geometric/data/in_memory_dataset.py", line 57, in __init__
    super().__init__(root, transform, pre_transform, pre_filter, log)
  File "/root/miniconda3/envs/spt/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 97, in __init__
    self._process()
  File "/root/autodl-tmp/superpoint_transformer/src/datasets/base.py", line 493, in _process
    self.process()
  File "/root/autodl-tmp/superpoint_transformer/src/datasets/base.py", line 528, in process
    self._process_single_cloud(p)
  File "/root/autodl-tmp/superpoint_transformer/src/datasets/base.py", line 559, in _process_single_cloud
    nag = self.pre_transform(data)
  File "/root/miniconda3/envs/spt/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 24, in __call__
    data = transform(data)
  File "/root/autodl-tmp/superpoint_transformer/src/transforms/transforms.py", line 23, in __call__
    return self._process(x)
  File `"/root/autodl-tmp/superpoint_transformer/src/transforms/partition.py",` line 144, in _process
    super_index, x_c, cluster, edges, times = cp_d0_dist(
  File "/root/autodl-tmp/superpoint_transformer/src/dependencies/parallel_cut_pursuit/python/wrappers/cp_d0_dist.py", line 261, in cp_d0_dist
    raise ValueError("Cut-pursuit d0 distance: for weighting quadratic and"
ValueError: Cut-pursuit d0 distance: for weighting quadratic and Kullback-Leibler parts, 'coor_weights' should be of size loss + 1, where loss indicates the number of coordinates involved in the quadratic part and the last weights is for KL

I'm still figuring out the problem.

1a7r0ch3 commented 7 months ago

Hi, thanks for the follow up. I fixed it, but I do not have the rights to push in this repo so waiting for my pull request to be accepted. The change is so light, I suggest you try on your side in the meantime : in the call of cp_d0_dist, lines 144 and after, replace the lone 1 (first argument) by (n_dim + n_feat). (and if some more work is needed, we might temporarily work on my fork, but I doubt it would be necessary now).

loicland commented 7 months ago

I just merged the fix! Thanks @1a7r0ch3 for high reactivity, and @better1593 for willing to help us test it out!

seppestaes commented 7 months ago

Hi @drprojects e.a.

Thanks for making the project open source and your responsiveness! Installation works like a charm with the last changes.

Tried training s3dis (32Gb and 11Gb configs) on the default Stanford dataset, but the process is stuck at 0%. Didn't find any errors in the logs.

Thanks again for sharing the project!

better1593 commented 7 months ago

Hi @drprojects e.a.

Thanks for making the project open source and your responsiveness! Installation works like a charm with the last changes.

Tried training s3dis (32Gb and 11Gb configs) on the default Stanford dataset, but the process is stuck at 0%. Didn't find any errors in the logs.

Thanks again for sharing the project!

It seems to work fine to me now. The processing procedure is slow here. Maybe you just need to wait a bit longer.

loicland commented 7 months ago

@better1593 how slow are we talking here? Preprocessing S3DIS should only take about 15 to 30 min on a standard workstation

1a7r0ch3 commented 7 months ago

Can you both be more specific about "stuck" and "slow"? Slow based on previous experience you ran, or slow based on what is reported in our paper? Can you maybe make the call to cut-pursuit part verbose? Not sure what's the best place, I guess it's in configs/datamodule/s3dis.yaml, but one can still put verbose=True directly within the call to cp_d0_dist in src/transforms/partition.py. In that case, and if some calls are indeed successful but slower than before, can you maybe pay attention at two things :

drprojects commented 7 months ago

Hi, just following this thread remotely but don't have much time to get involved right now.

@1a7r0ch3 all this would be much simpler if we simply restored "improve_merge" for a couple of weeks. My code for SuperCluster solves pretty much all the issues Loïc and you have been patching today, except I need to wrap it up with a little bit of documentation before publishing it.

@seppestaes are you using a Docker environement ? If so, several closed issues report troubles with Docker + parallel preprocessing for S3DIS, please check them up to see if it solves your problem.

seppestaes commented 7 months ago

Hi @drprojects ,

Yes, the project's containerized. Taking a look at the related issues.

Apologies for not having checked beforehand.

Thanks a zillion

1a7r0ch3 commented 7 months ago

@drprojects I am not even sure we can restore improve_merge. I (and other users) was starting to get confused on the various versions, I put all improvements from all branches in master and simply deleted the all other branches. Since you were already using the interface cp_d0_dist for SuperCluster, I was convinced you made the switch for SPT as well. I know I have some backups, so if it is absolutely necessary I can dig them up and set up a separate repo for having exactly the same interface and behavior. But it seems it will be solved faster (and clearer, and maybe with minor improvements in the end) to make the adaptation than to revert to an ugly temporary folder.

drprojects commented 7 months ago

@drprojects I am not even sure we can restore improve_merge. I (and other users) was starting to get confused on the various versions, I put all improvements from all branches in master and simply deleted the all other branches. Since you were already using the interface cp_d0_dist for SuperCluster, I was convinced you made the switch for SPT as well. I know I have some backups, so if it is absolutely necessary I can dig them up and set up a separate repo for having exactly the same interface and behavior. But it seems it will be solved faster (and clearer, and maybe with minor improvements in the end) to make the adaptation than to revert to an ugly temporary folder.

That's up to you, if you feel like spending some time to adapt SPT + testing. On my end I won't be able to help much until Feb 1st at the earliest

1a7r0ch3 commented 7 months ago

I did some testing on a server.

Running python src/train.py experiment=s3dis_11g datamodule.fold=5 just to check preprocessing. It takes indeed more time than expected : about one hour for s3dis data. Making the process more verbose shows that time spent doing the partition with cut-pursuit is less than 10% of time spent preprocessing (not sure what in the other preprocessing tasks takes more time), and cut-pursuit definitely does not get stuck in any preprocessing I've observed. I then dug up backups of the old improve_merge branch (pretty sure it's the right one...), and I launched the same pre-processing, and observed the same behavior.

I just re-launched the actual training, for checking that prediction performance of SPT with the updated master branch of cut-pursuit are the same as with the old improve_merge branch. I might find some time tomorrow to check the results.

1a7r0ch3 commented 7 months ago

Ok, took some time to look at the results, testing s3dis on a 48-core with Tesla P100-PCIE-16GB.

As reported above, preprocessing time is the same with cp_d0_dist (master) and with cp_kmpp_d0_dist (improve_merge). As a side note, it is longer than expected, ~1h; at first glance, most preprocessing time is not in cut-pursuit but elsewhere, so difference with what is reported in the paper is attributed to less powerful GPU. One test performance for s3dis Area 5 gives 67.05 miou and 88.99 oa for cp_d0_dist (master) and 67.23 miou and 88.76 oa for cp_kmpp_d0_dist (improve_merge); I think these variations are not significant (and in coherence with what is reported in the article, slightly less than with 32GB GPU, in somewhat longer time).

@drprojects Damien, tell me what you prefer. Either :

seppestaes commented 7 months ago

Haven't been able to test yet, looking into an issue I encountered (multi-core sharing OS - docker). This issue related to the parallel repo can be closed? Thx again, Seppe

1a7r0ch3 commented 6 months ago

So, some tests run with success, and no further complaints. I think we can close this issue.