dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Trailing comma in header crashes AutoML wizard & VS 2019 #432

Closed ericjohannsen closed 4 years ago

ericjohannsen commented 4 years ago

System information

Issue

Source code / logs

The issue is caused by a trailing comma in the header. Our data extraction tool created a CSV header like:

C1, C2, C3,..CMany,Label,

Removing the trailing comma (in a text editor, or by opening the CSV in Excel and saving it) corrects the issue.

LittleLittleCloud commented 4 years ago

Thanks for your feedback, does the issue still exists in AutoML wizard (Model Builder)?

It seems to be a bug in model builder. @JakeRadMSFT BTW could we have your dataset for further investigation?

ericjohannsen commented 4 years ago

I haven't retried it with the current version and have a different focus at the moment.

The data is proprietary but I believe any data with a trailing comma exhibits this issue, e.g.:

Name,City, Ted,San Diego, Lynda,Fresno,

LittleLittleCloud commented 4 years ago

I just try to reproduce it in current model builder, now for trailing comma case, it throws a data error ("Unrecognized data format. Please check the input file to make sure it is a valid comma or tab separated file") which is expected. So @ericjohannsen the current release should already fix your problem. Thanks for your feedback. I'm going to move this issue to validate/close

LittleLittleCloud commented 4 years ago

Validated by Xiaoyun in VS2019

zewditu commented 4 years ago

Validated: it shows the following data error and expected image.png

zewditu commented 4 years ago

I think this one should be moved to close instead of ready to ship