dmlc / dmlc-core

A common bricks library for building scalable and portable distributed machine learning.
Apache License 2.0
864 stars 520 forks source link

Make yarn.py and launcher.py compatible with Python 3 #564

Closed hcho3 closed 5 years ago

hcho3 commented 5 years ago

Closes dmlc/dmlc-core#538

@ericangelokim @thvasilo @chenqin @trivialfis Can you review?

For now, I added a compatibility shim to support both Python 2 and 3. See #565

ericangelokim commented 5 years ago

Does this need to be done in any other files?

What would be the repurcussions of just moving this to be only python3 compaible? Py2 is deprecated; it feels backwards to have to support this. Current users can continue to use an older git commit if necessary.

hcho3 commented 5 years ago

@ericangelokim So you are okay with dropping Python 2.x?

ericangelokim commented 5 years ago

The versioning of this code is dependent on xgb version?

hcho3 commented 5 years ago

@ericangelokim It's other way around. Each XGBoost version has reference to exact commit hash of dmlc-core.

ericangelokim commented 5 years ago

Right so existing customers of xgboost (locked to a previous version) will not be affected by this.

In that case I think it makes sense to deprecate python 2 support.

ericangelokim commented 5 years ago

You may want to run an example training with python3 when changes are done to ensure that this is the only place that is affected.

thvasilo commented 5 years ago

I encountered the same error in the MPI tracker today, assume it's the same cause.

hcho3 commented 5 years ago

@thvasilo I'm not sure why that line would cause issue. b'mpich' in out seems quite valid.

thvasilo commented 5 years ago

I'll need to check again, I'm running 0.72 and got this error, didn't notice the new code on master.

On Thu, Sep 26, 2019, 18:31 Philip Hyunsu Cho notifications@github.com wrote:

@thvasilo https://github.com/thvasilo I'm not sure why that line would cause issue. b'mpich' in out seems quite valid.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/dmlc-core/pull/564?email_source=notifications&email_token=ACFBHI4HVNOP42NTE32MRR3QLTPULA5CNFSM4I2EWYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7WFXNA#issuecomment-535583668, or mute the thread https://github.com/notifications/unsubscribe-auth/ACFBHI6Z2JLMOLMXZLIYTZLQLTPULANCNFSM4I2EWYRA .

thvasilo commented 5 years ago

I encountered the same error in the MPI tracker today, assume it's the same cause.

Yup never mind, this was back in 0.72

thvasilo commented 5 years ago

I launched an EMR cluster today to test this out.

I got this error in one of the containers:

Traceback (most recent call last):
  File "./launcher.py", line 81, in <module>
    main()
  File "./launcher.py", line 55, in main
    for f in classpath.split(':'):
TypeError: a bytes-like object is required, not 'str'

Adding the argument encoding='utf-8' to the call to Popen in launcher.py fixed the issue.

I think the reason is explained in the docs for Popen:

If encoding or errors are specified, the file objects stdin, stdout and stderr are opened in text mode with the specified encoding and errors, as described above in Frequently Used Arguments. If universal_newlines is True, they are opened in text mode with default encoding. Otherwise, they are opened as binary streams.

hcho3 commented 5 years ago

@thvasilo Can you try my latest fix? I did not use encoding='utf-8' because it's only supported in Python 3. In this PR, I aim to support both Python 2 and 3, since #565 has not been decided yet. @trivialfis See #568.

jozsi commented 4 years ago

I am getting the following error when trying to run dmlc-submit for yarn via ./xgboost/demo/distributed-training/run-aws:

Traceback (most recent call last):
  File "./launcher.py", line 10, in <module>
    from .util import py_str
ModuleNotFoundError: No module named '__main__.util'; '__main__' is not a package

There's also a bug on line 58, if I'm not mistaken: classpath = py_str(class_path) should be classpath = py_str(classpath)