Closed Andrewzh112 closed 3 years ago
Hi @Andrewzh112 ,
We are working on making this code ready for release. Before this occurs, if you are interested in generating ground-truth semantic maps please see this discussion.
Hi @Andrewzh112 ,
We are working on making this code ready for release. Before this occurs, if you are interested in generating ground-truth semantic maps please see this discussion.
Thanks!
I'm going to reopen issue in case anyone else is interested in this, I'll close again once the implementation is public.
HI @Andrewzh112 , these experiments have been merged in, see PR #14 . I haven't uploaded the pretrained model weights just yet but those will be coming shortly. Note that you'll have to update your version of AllenAct to the newest version (0.2.3) as that's where I've distributed the ActiveNeuralSLAM
model (relevant PR).
Note that I've put a decent amount of work into making the mapping sensors efficient (GPU accelerated) but they are still noticeably slower than running without them. I get around 100-150 FPS during training when running on several GPUs. Let me know if you have any questions.
@Lucaweihs thanks for the implementation.
As far as I looked at the example script and active_neural_slam.py, you provided a semantic mapping capability by introducing extra 70 channels (i.e., 210x210x72). And, I assume that no object segmentation and detection is implemented with some like MaskRCNN/etc. yet? Could you correct my understanding on this point?
Hi @ugurbolat, if I'm understanding you correctly, yes that's right. We don't do any explicit pixel-to-pixel semantic segmentation currently. That said, AI2-THOR can provide ground truth instance/semantic segmentation frames. While you shouldn't give these frames directly to the agent at inference time (we only allow agents access to RBG+depth for the challenge) you could use these to fine-tune a MaskRCNN.
@Lucaweihs thanks for the quick reply.
To be more precise, I plan to use only the semantic mapping capability of the Active Neural SLAM (AN-SLAM) for evaluating the quality of the predicted map without getting into action part such as navigation and planning.
For example, the agent should build a semantic map for both the walkthrough and unsuffle sessions and compare those two maps. And, compare with the ground truth for evaluation. One downside is that for my experiments, the navigation actions for exploration should be given. The complete ReArrangement task is too complex at the moment for meπ
And, I've noticed that AllenAct and AI2-Thor are well-written frameworks and you already implemented AN-SLAM's semantic map as a baseline and the nice utility functions for top-down mapping. In the future, I would like to integrate benchbot with AllenAct as an extra environment since I find your framework quite modular.
@ugurbolat it would be fantastic to have benchbot integrated into AllenAct. I was optimistically thinking I might do this myself at some point (especially given the RVSU challenge) but I just don't have the time given other projects/commitments. I'm happy to provide any support you might in this regard though.
One downside is that for my experiments, the navigation actions for exploration should be given.
Gotcha, in case you haven't seen it, we have a test in AllenAct (see here) that might be useful if you'd like to see one way to generate the AN-SLAM map outside of AllenAct's train/test functions.
Also, if it's relevant to you, we do support having the agent follow expert actions during training. See, for instance, the projects/objectnav_baselines/experiments/objectnav_mixin_dagger.py
file, in particular the line
teacher_forcing=LinearDecay(startp=1.0, endp=1.0, steps=tf_steps,),
results in the agent following the expert's actions for tf_steps
training steps. In theory you could design your task so that the expert actions were just your pre-determined navigation actions.
Let me know if I can be of any more help.
@Lucaweihs I initially got interested in benchbot especially because of the Scene Change Detection challenge but AI-Thor seems like a better choice for my initial experiment since it is more light-weight. And, I want to see if I can simplify the ReArrangement task into the Scene Change Detection task as I consider the latter is a subset problem of the former.
Thanks for the leading points. That seems like exactly what I was looking for.
Let me dig more into those.
Gotcha @ugurbolat , yes that change should be pretty straightforward (in theory!).
we do support having the agent follow expert actions during training.
@Lucaweihs I've experimented a bit with expert actions provided from GreedyUnshuffleExpert. Since the actions are for ReArrangement task, I am not sure if it would be an optimal path for exploring the environment to build a semantic map or fine-tune the MaskRCNN. How should I approach if I want to record/build manually trajectory that explores all scenes so that I can create a training dataset?
Hi @ugurbolat, that's an interesting question. I suspect you'd like your trajectory to be such that you exhaustively explore the environment and see all of the objects, correct? If I were going to do this I think I would create a new heuristic "expert" which followed a simple greedy strategy that ensured the agent saw every object. Namely I would create a seen_objects
set to store all of the objects my agent has seen so far and then, in a loop,:
seen_objects
set is closest to my agent's current position. Call this object obj
.GreedyUnshuffleExpert
, use lines similar to
interactable_positions = env._interactable_positions_cache.get(
scene_name=env.scene, obj=obj, controller=env.controller,
)
to figure out which positions obj
is visible from.
ShortestPathNavigatorTHOR
object (again as in GreedyUnshuffleExpert
) to find the next action that would take me to the closest interactable position.seen_objects
set and repeat.Once you've built this expert you have a few different options for training your mapping / the MaskRCNN:
teacher_forcing
in the TrainingPipeline
to equal something like LinearDecay(steps=training_steps, startp=1, endp=1)
(i.e. always follow the expert action) and then just train your auxiliary models using whichever losses you like as usual.Let me know if that helps or if anything is unclear :)!
Could you guys also include the ANS implementation that was mentioned in the paper?
Thanks!