Closed justinormont closed 3 years ago
Validate this issue on:
dotnet tool list -g mlnet 16.5.22
dotnet --version 5.0.202
Data Set: https://testpass.blob.core.windows.net/test-pass-data/taxi-fare.csv
Steps: If not indicate the file extension in cmd commend, will have error: File does not exist
So this issue still can be repro, and please let me know if the above steps is incorrect.
@JakeRadMSFT This hasn't been fixed. Should we close and see if any customers ask for this?
For the model builder, Isn't the check for file extension deliberately added, even it's not necessary, so that user knows they should pass a fix-width txt-format file?
And @vzhuqin The test fails because the file does not exist on disk, which is expected. What @justinormont means is that mlnet shouldn't throw an exception when the file extension is not .csv
, .tsv
or .txt
, which mlnet.cli no longer check any more.
The CLI & Model Builder have a check for acceptable file extensions:
We should remove the check in favor of improving the existing error message. See: https://github.com/dotnet/machinelearning-modelbuilder/issues/748
There is no reason for this check and it reduces the functionality of the product.
Why it's not needed: AutoML already sniffs the file to figure out its format, and from that determines if it's readable. And presents an error message if not.
Why it reduces functionality: Many datasets don't have these extensions. For instance datasets named as
dataset.train
&dataset.test
. Or simply have no extension. When I'm running datasets from a remote fileshare, I generally do not have the ability to rename them to conform to this rule.