dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
265 stars 56 forks source link

Remove check for acceptable file extensions #749

Closed justinormont closed 3 years ago

justinormont commented 4 years ago

The CLI & Model Builder have a check for acceptable file extensions:

private readonly string[] supportExtensions = new string[] { ".csv", ".tsv", ".txt" };

We should remove the check in favor of improving the existing error message. See: https://github.com/dotnet/machinelearning-modelbuilder/issues/748

There is no reason for this check and it reduces the functionality of the product.

Why it's not needed: AutoML already sniffs the file to figure out its format, and from that determines if it's readable. And presents an error message if not.

Why it reduces functionality: Many datasets don't have these extensions. For instance datasets named as dataset.train & dataset.test. Or simply have no extension. When I'm running datasets from a remote fileshare, I generally do not have the ability to rename them to conform to this rule.

vzhuqin commented 3 years ago

Validate this issue on:

Data Set: https://testpass.blob.core.windows.net/test-pass-data/taxi-fare.csv

Steps: If not indicate the file extension in cmd commend, will have error: File does not exist image.png

So this issue still can be repro, and please let me know if the above steps is incorrect.

beccamc commented 3 years ago

@JakeRadMSFT This hasn't been fixed. Should we close and see if any customers ask for this?

LittleLittleCloud commented 3 years ago

For the model builder, Isn't the check for file extension deliberately added, even it's not necessary, so that user knows they should pass a fix-width txt-format file?

And @vzhuqin The test fails because the file does not exist on disk, which is expected. What @justinormont means is that mlnet shouldn't throw an exception when the file extension is not .csv, .tsv or .txt, which mlnet.cli no longer check any more.