Problems about horovod run

ChengkaiYang commented 6 months ago

Hello,Luke: I think your excellent work is deserved to study.But when i'm running your code with horovod commands horovodrun -np 4 -H localhost,it says: [mpiexec@esc8000-g4] match_arg (utils/args/args.c:163): unrecognized argument allow-run-as-root [mpiexec@esc8000-g4] HYDU_parse_array (utils/args/args.c:178): argument matching returned error [mpiexec@esc8000-g4] parse_args (ui/mpich/utils.c:1642): error parsing input array [mpiexec@esc8000-g4] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments [mpiexec@esc8000-g4] main (ui/mpich/mpiexec.c:148): error parsing parameters Seem it has something wrong with my mpi4py installing when following your instructions in README file pip install mpi4py == 3.1.4.But i can't fix it.I'd appreciate it if anyone can solve my problem!

ChengkaiYang commented 6 months ago

Another question is about your johnson algorithm,when i am training step 2 on INTERACTION dataset,an serious problem occurs that the program finding too much circles in the graph to continue running,maybe it costs too much in some cases.Have you noticed this in your training on INTERACTION dataset?Thanks!

HenryDykhne commented 6 months ago

The part with too many cycles should not be happening unless something went wrong with stage 1 training, or unless you have modified the code to try to repredict off of initial predictions in an autoregressive manner.

ChengkaiYang commented 6 months ago

The part with too many cycles should not be happening unless something went wrong with stage 1 training, or unless you have modified the code to try to repredict off of initial predictions in an autoregressive manner. Thank you for an immediate reply!I have fixed that problem just now.That's because when i am using a two stage training at stage 2,i didn't load the pretrained parameters in stage 1,which causes the DAG was constructed too much edges with inital parameters of relation heads and has too much circles to remove. So i add some extra code to the origin code in fjmp.py in front of :"model._train(train_loader, val_loader, optimizer, starting_epoch, val_best, ade_best, fde_best, val_edge_acc_best)": if (model.two_stage_training) and (model.training_stage == 2): model.load_relation_header() If i don't add into the origin code,it will use random initial parameter for relation head which will cause the endless circles problem.

HenryDykhne commented 6 months ago

Stage 2 training should automatically load stage 1 weights if you give it the right config name in the command. I reccomend reviewing the command line params at the top of fjmp.py, as well as the examples in the readme.

ChengkaiYang commented 6 months ago

Stage 2 training should automatically load stage 1 weights if you give it the right config name in the command. I reccomend reviewing the command line params at the top of fjmp.py, as well as the examples in the readme.

Thanks,i have just found that i set the hyperparams 'learned_relation_header' default = Ture in fjmp.py.That settings will force to use pretrained_relation_header but inital random relation_header to generate edges in DAG.That is the reason to cause my problem.

ChengkaiYang commented 6 months ago

Stage 2 training should automatically load stage 1 weights if you give it the right config name in the command. I reccomend reviewing the command line params at the top of fjmp.py, as well as the examples in the readme.

But i still have a confusion to confirm.In the stage one of two stage training code,FJMP also train proposal decoder.But in your supplement materials,Feature Encoder 1 and Feature Encoder 2 / Proposal Decoder 1 and Proposal Decoder 2use separate weights.Why not use Feature Encoder 1/Proposal Decoder 1 as pretrained weights for Feature Encoder 2/Proposal Decoder 2.Since proposal decoder loss is introduced to the total loss as a regularization,i think that might be useful? Another question is about how to understand factorized?Does it means FJMP first encode the source agent (influencer)and encode the reactor based on the condition of source agents' encodes?This can be described as factorized?

HenryDykhne commented 6 months ago

I will leave the first parts of your question for Luke to answer since he will do it more accurately.

For the second part about the factorization, I will say that the factorization only happens in the decoder part of the network.

ChengkaiYang commented 6 months ago

I will leave the first parts of your question for Luke to answer since he will do it more accurately.

For the second part about the factorization, I will say that the factorization only happens in the decoder part of the network.

Sorry,caused by the wrong understanding and distinguishing between encoder and decoder,what i presented are unaccurate.In fact,in trajectory prediction,encoder only happens to the past information.I used to think of that unless the finally MLP of trajectory decoder are all called encoder.But all networks except LaneGCN which models for the future interaction are called decoder.So factorization happens on the decoder part. On the models picture of the paper,does the factorization means the aggregation approach of DAGNN are factorized,which means first calculate the green and yellow agents trajectory representation,secondly calculate the blue agent representation based on the condition of green and yellow agents? I have seen some papers about trajectory papers,they also mentioned about 'factorized'.But their factorized means factorize attention models into three parts--"Temporal Attention","Map lane to agents" and "agents to agents".I think your novel idea about factorized is about influencer and reactor based on the topology of DAG but not factorize attention models into three parts.Does FJMP use factorized methods based on influencer and reactor to avoiding modeling the distribution of agents simultaneously？

RLuke22 commented 6 months ago

Hi Chengkai,

Sorry for the late reply. Yes, by "factorized", we mean that the distribution of multi-agent future trajectories is factorized as in Equation (1) in the paper. What you described above is correct regarding how we calculate the trajectory representations of the green, yellow, and blue agents.

Thanks for making the other point! We define "factorized" here as factorization of a joint probability distribution, which is different from the "factorized" socio-temporal attention used in papers such as SceneTransformer, Wayformer, and AutoBots.

RLuke22 commented 6 months ago

Stage 2 training should automatically load stage 1 weights if you give it the right config name in the command. I reccomend reviewing the command line params at the top of fjmp.py, as well as the examples in the readme.

But i still have a confusion to confirm.In the stage one of two stage training code,FJMP also train proposal decoder.But in your supplement materials,Feature Encoder 1 and Feature Encoder 2 / Proposal Decoder 1 and Proposal Decoder 2use separate weights.Why not use Feature Encoder 1/Proposal Decoder 1 as pretrained weights for Feature Encoder 2/Proposal Decoder 2.Since proposal decoder loss is introduced to the total loss as a regularization,i think that might be useful? Another question is about how to understand factorized?Does it means FJMP first encode the source agent (influencer)and encode the reactor based on the condition of source agents' encodes?This can be described as factorized?

Regarding using stage 1 weights as a pre-trained checkpoint for training stage 2, we did not try this ourselves but I agree it makes sense and could be an interesting experiment!

Yes, we first predict the futures of the influencer(s) and then encode their futures with an MLP. The reactor then conditions on the encoded predicted futures of the influencer(s) (via graph attention). Therefore, decoding becomes a sequential process, where the length of the sequence is the longest path length in the DAG. I hope this clarifies things!

RLuke22 / FJMP

Problems about horovod run #6