Closed rufuspollock closed 6 years ago
If some OpenML dataset has source on UCI, I am using UCI dataset.
Datasets that will be put into datahub are:
Next datasets are organized by "most runs" ordering on OpenML...
Github repos:
Datahub user:
@svetozarstojkovic could you give a brief reason so that people know why (esp as we suggested going with openml by default :wink: - i'm very happy you went with this but just say why helps others who might work on this)
Most of the datasets I found on OpenML had source on UCI, so I just went on UCI and used theirs datasets, those which didn't had UCI source I am using OpenML.
@rufuspollock anything remaining on this except blog post, can we close? @Mikanebu can you take on blog post?
@zelima ok
@Mikanebu any progress here?
@zelima I have not started yet writing blog post. I will add it in my next24
Is this now a DUPLICATE of https://github.com/datahq/datahub-qa/issues/33?
FIXED/DUPLICATE. Think as a Part I this is done. The blog post will come with Part II if such will be needed. As a part of this issue, we've got the post about arrf here https://datahub.io/blog/attribute-relation-file-format-arff
One large potential user group for DataHub are people working in data science and machine learning
Question: is there a difference between machine learning and data science? Is ML only about neural net stuff or does it include classic predictive analytics ranging from regression to random forests. My sense is that we can go with ML even what we are talking about is a bit broader.
As someone starting learning data science (and machine learning) I want good ready-to-use sample datasets I can use for practice so that I can focus on practising analytics rather than data wrangling
As a more advanced student of machine learning I want to get a wide range of well-prepared datasets (including well known ones) that I can practise on so that I can improve and focus my efforts on learning not data acquisition
As a Machine Learning practitioner I want to find up to date datasets which I can use for implementing newest classificators so that I can contribute to machine learning community or create projects for company I work in.
Please add to these
Acceptance criteria
Tasks
Analysis