dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.89k forks source link

Enable ML.NET to support open access healthcare models #457

Closed ghost closed 5 years ago

ghost commented 6 years ago

Please update ML.NET so it will support existing open access healthcare datasets and add tools that would easily enable comparison of genomic data and overlapping data (which could be used to detect cancer patterns) pythons bedtools is a good example of this

Allow us to train our models using open datasets e.g. this example with azure ml. https://blogs.msdn.microsoft.com/cdndevs/2016/05/31/getting-started-with-machine-learningwisconsin-breast-cancer-dataset/

justinormont commented 6 years ago

Greetings @hybridware,

Do you have a list of parts which the project needs to add to support your open access heathcare datasets and genomic data? Additional details for the components which are missing would be useful.

In the linked example, it's using the Wisconsin Breast Cancer dataset, which is redistributed in the ML.NET repo and is used in many of our tests. The linked example on Azure Machine Learning also uses ML.NET code in the background. Or are you referring to lack of ability to read in the dataset? I think the original distributed format is a TSV format which ML.NET reads rather well (caveats are dataset w/ newlines within quoted strings, and some forms of escaping).

In general, you can bring in any dataset. Though the components you need for featurization, or scoring could be missing.

Ivanidzo4ka commented 6 years ago

DRI RESPONSE: Currently we focused on CSV format and in memory data through IEnumerable. I doubt we have plans to release additional readers for different formats prior v1.0. So I'm moving this issue to backlog.

codemzs commented 5 years ago

We don't plan on supporting this file formats anytime soon. Closing this as this is not even on our long term road map (i.e within the next year). In the meantime feel free to convert the dataset to csv or IEnumerable. ALSO, please consider using ML.NET's python binding called NimbusML.