dmlc / dmlc-core

A common bricks library for building scalable and portable distributed machine learning.
Apache License 2.0
864 stars 519 forks source link

Don't populate labels if label column is not specified in csv parser #679

Closed rongou closed 1 year ago

rongou commented 1 year ago

Right now the CSV parser sets labels to 0 if the label column is not specified (or set to -1). This is surprising to the user and leads to cryptic error messages. It's probably better to just leave the labels as empty if not specified.

For vertical federated learning, we may have workers that don't have access to the label, so this would enable them to parse csv shards without erroneously setting labels to 0.

rongou commented 1 year ago

@hcho3 @trivialfis

hcho3 commented 1 year ago

Merging for now. I'll try to make time to fix the CI.