Closed DanielCoelho112 closed 2 years ago
Hi Daniel,
the second is clearly better, although the 1cm is a soundbyte and if it is 10cm it does not sound as loud.
The 1cm and 2º are arbitrary values, but what I mean is that we aren't exploring in-depth the impact of the number of frames. We are always trying to be "fair" with the other papers, and in doing so, we are neglecting one thing that only Synfeal can offer.
Speak for you, Lucas, Paulo and Tiago. I am always on the opposite side: don't be fair to the point you begin helping the other approach.
For instance, in https://github.com/DanielCoelho112/localization_end_to_end/issues/77, we saw that to maintain the number of frames/space ratio, we should use 18000 frames, but why should we? To us, collecting 18000 or 32000 is the same, and I believe that with 32000 the results would be much better.
There are costs: the size of the dataset and the time of training. But you are right. We may be on the doorstep of something much better in terms of accuracy and are deciding not to explore the doorway. That does not make a lot of sense.
I would say we could do an extra subsection on the results to explore this. Actually, perhaps it's something similar to what we discussed the other day: an experiment where we assess the impact on localization accuracy varying the number of images used for training.
To me the only constraint is time, we should aim to publish this a.s.a.p. Apart from that it all makes sense.
I would say we could do an extra subsection on the results to explore this. Actually, perhaps it's something similar to what we discussed the other day: an experiment where we assess the impact on localization accuracy varying the number of images used for training.
To me the only constraint is time, we should aim to publish this a.s.a.p. Apart from that it all makes sense.
Yes, I'll add an option to the data collector to only capture RGB images and the pose to speed up the process. Then I will collect a big dataset in the 024, maybe 50000 frames, and train the model. I'll leave it running for the weekend and then we can see if the 1cm appears or not :)
Synfeal, as a data-driven simulator, has the main advantages of accuracy and availability of data. We are proving accuracy by maintaining the number of frames and changing the data collection system, however, we are neglecting the impact of the number of frames on the localization performance.
What do you think is better as a selling idea:
Synfeal is a data-driven simulator that produces accurate datasets, where we demonstrate that we can decrease the position error by 10 cm and the rotation error by 10º when compared with current state of the art data collection systems.
or
"Synfeal is a data-driven simulator that produces accurate and bigger datasets, where we demonstrate that, even with simple deep learning models, we can achieve a localization position error of 1cm and rotation error of 2º in challenging scenes, which is something no one has ever done."
The 1cm and 2º are arbitrary values, but what I mean is that we aren't exploring in-depth the impact of the number of frames. We are always trying to be "fair" with the other papers, and in doing so, we are neglecting one thing that only Synfeal can offer.
For instance, in #77, we saw that to maintain the number of frames/space ratio, we should use 18000 frames, but why should we? To us, collecting 18000 or 32000 is the same, and I believe that with 32000 the results would be much better.