About custom data - Githubissues

Dingry commented 1 year ago

Hi, thanks for your exciting work. I have applied your method to the dataset https://terascale-all-sensing-research-studio.github.io/FantasticBreaks/. for fracture assembly. but I encountered some difficulties in achieving satisfactory results like the picture I showed. Could you please share some insights on how to improve the performance? I have attached the ply file of the raw fractures for your reference. Thank you very much.

objects.zip

freerafiki commented 1 year ago

Hi Dingry, thanks for the interest. That dataset looks great to play with. Most likely it would need to adapt some parameters (this is one of the things we are hoping to improve, there is sensitivity to parameters). However, being Fantastic Breaks similar to Breaking Bad, it should not be too much of a change. Can you tell us what parameters you used? One quick way to check what has gone wrong is to take a look at the breaking curves and the segmented regions. The results should be extracted in the output folders (some borders.ply should contain border and some regions.ply should contain the colored regions). These can give us a good hint about which part of the pipeline failed!

Dingry commented 1 year ago

I ran the assemble_fragments.py script with the default parameters. The results show the breaking curves and the segmented regions as below. I think the main issue is with the segmentation part, because it did not detect any segments.

freerafiki commented 1 year ago

Yes, you are right. The segmentation relies heavily on the breaking curves, which in this case are not completely connected (meaning they do not form a closed curve). This is the issue, and is most likely due to a different density or point cloud size. Do you have any clue about the difference in terms of density / point cloud size / quality between meshes from Fantastic Breaks and the ones from Breaking Bad? Otherwise I will check with the data you provided above and take a look.

Dingry commented 1 year ago

Thanks for your reply. The Fantastic Breaks provides objects in mesh format. I sample 5000 points for each fracture. How does this affect the performance? Should I make the density of the two fractures equal and sample them more densely?

freerafiki commented 1 year ago

Ah, okay. Breaking bad is similar. The procedure is fine, maybe it would be better to sample with higher number of points (10-30k).

The density affects the results, in the sense that we build a graph with many connection from point to point. This graph is used to estimate which edges belong to these breaking curves, which are curves that separate the regions (the regions are then segmented, at the moment using a queue, but we already saw that there are quicker way like (h)dbscan for example, if you need to speed up the process). The parameter controlling how the graph is build and the pruning of the curves are usually depending on the density (lower density, you need to allow larger connection, and the opposite works as well). I would say for these parameters, a slightly larger sample size should work better. The larger you go, the slower it becomes (and mostly because of the segmentation, which, if the lines are correct, it is relatively straightforward, we hope to push soon some improvements there), so a trade-off should be found. Maybe try with 15k and then 30k if it does not work? I see in the config file we had set it to 30k. Otherwise you could try relaxing thresholds (thre = 0.93 controls what is considered "on the edge" so lowering this value will result in more points classified as border, or dil = 0.01 controls dilation, increasing this value could help closing some of the lines).

Today I don't think I manage, but I can also try to take a look in the next days. This is time-consuming, but I think and hope that once we find the correct set of parameters, it should work for all/most of the meshes in the dataset. And it would be great to push another config file for the Fantastic Breaks dataset! Thanks for the help!

PS: Do they have inner structures? (I ask this because some data from Breaking Bad had inner structures which created sometimes trouble --> see this issue which had this workaround with visualization). If yes, it would be better to preprocess the data to make it manifold and watertight (we used ManifoldPlus when this happened).

Dingry commented 1 year ago

Thanks so much for your kind help. I will experiment with the parameters to see if I can improve the performance. PS: The objects in Fantastic Breaks look smooth and free of unwanted inner structures.

freerafiki commented 1 year ago

Hi @Dingry, just for the sake of curiosity, did you manage to get good results? Do you have some insights to share? I would be interested to know your point of view and your results on the Fantastic Breaks dataset! Thanks in advance.

Dingry commented 1 year ago

Hi, I still cannot obtain good results after tuning a lot of hyper-parameters. I have no idea how to improve the results on Fantastic Breaks.

freerafiki commented 1 year ago

Ok thanks for the feedback. Hopefully as soon as I have some spare time I will also try myself to see if it's possible, because I think it should work. Talking to some colleagues, someone pointed out to me that the meshes from the newer datasets have inner structures and different densities as well, so there may be the need for some pre-processing (hopefully manifoldplus should be enough). Unfortunately this is hard to bypass. I will update if I manage to get some time to do some more tests on the fantastic breaks dataset.

freerafiki commented 1 year ago

Hi, I managed to take a look. Although it is nice to have realistic data (real broken scanned objects), these look very challenging due to different conditions:

variable density (with very high changes, even when remeshing and resampling I still get too much variability) (see the mug figure)
noise in some parts, which are "detached" from the rest (see the head figure)
imperfect match (the two broken surfaces seem not to really match, just be similar) (see the registration figure)

So I think this dataset (Fantastic Breaks) may be suitable for learning shape and restoration, but it is not the best for geometry-based assembly. Some objects are in fact better and some worse, so result varies with the mesh you choose. Very thin surfaces may cause problem (or may need different parameters), "filled objects" work better.

I show some pictures from the experiments:	Mug (variable density, thin parts)	Statue/Toy (less variable, filled mesh)	Head (second part of the statue/toy)

You can see the difference in the two meshes. If we try to reassemble the mug, we get bad results (the segmentation fails). If we try with the statue/toy, we get better results, in fact segmentation is (at least) plausible and the registration is choosing the correct pair of surfaces. However, due to the noise (I guess) it does not find the correct alignment (even by hand is hard)

Segmentation of the two parts of the statue/toy	Proposed registration

As you can see segmentation and registration are better (with respect to the mug), but still unsatisfactory. I think at the current state of this algorithm, it is not a good match with this dataset and is most likely to fail/obtain poor performances, I agree. We will think if there is something we can improve (we are also working on some extension), because we did not expect the noisyness in the data (and it seems from this issue that we are not the only one) and we will see what can be done!

Thanks for your insight!

RePAIRProject / AAFR

About custom data #4