Closed ghost closed 5 years ago
Greetings @hybridware,
Do you have a list of parts which the project needs to add to support your open access heathcare datasets and genomic data? Additional details for the components which are missing would be useful.
In the linked example, it's using the Wisconsin Breast Cancer dataset, which is redistributed in the ML.NET repo and is used in many of our tests. The linked example on Azure Machine Learning also uses ML.NET code in the background. Or are you referring to lack of ability to read in the dataset? I think the original distributed format is a TSV format which ML.NET reads rather well (caveats are dataset w/ newlines within quoted strings, and some forms of escaping).
In general, you can bring in any dataset. Though the components you need for featurization, or scoring could be missing.
DRI RESPONSE: Currently we focused on CSV format and in memory data through IEnumerable
We don't plan on supporting this file formats anytime soon. Closing this as this is not even on our long term road map (i.e within the next year). In the meantime feel free to convert the dataset to csv or IEnumerable. ALSO, please consider using ML.NET's python binding called NimbusML.
Please update ML.NET so it will support existing open access healthcare datasets and add tools that would easily enable comparison of genomic data and overlapping data (which could be used to detect cancer patterns) pythons bedtools is a good example of this
Allow us to train our models using open datasets e.g. this example with azure ml. https://blogs.msdn.microsoft.com/cdndevs/2016/05/31/getting-started-with-machine-learningwisconsin-breast-cancer-dataset/