cylondata / twister2

A composable framework for fast and scalable data analytics
https://twister2.org
Apache License 2.0
57 stars 32 forks source link

Issue in fetching worker ip #924

Closed DImuthuUpe closed 4 years ago

DImuthuUpe commented 4 years ago

In [1], it's trying to fetch the local interface ip of the worker node. However the ideal value should be the ip provided for the particular node in nodes config file for mpirun. Right now, Twister2Environment.peers() [2] returns {0: '127.0.1.1', 1: '127.0.1.1'} for JMController which is not very helpful. I suggest below options

  1. Figure out a way to map the ip provided in nodes file to the particular worker
  2. Provide and external configuration to override the default worker ip resolution

[1] https://github.com/DSC-SPIDAL/twister2/blob/master/twister2/resource-scheduler/src/java/edu/iu/dsc/tws/rsched/schedulers/standalone/MPIWorkerStarter.java#L593

[2] https://github.com/DSC-SPIDAL/twister2/blob/master/twister2/python-support/src/main/java/edu/iu/dsc/tws/python/Twister2Environment.java#L37

DImuthuUpe commented 4 years ago

Apparently found that there is a way to specify the target interface using twister2.network.interfaces.for.workers property in network.yml