georgymh / decentralized-ml

Interoperable and decentralized machine learning.
Apache License 2.0
9 stars 5 forks source link

cleaned iterator, made integer labeler #6

Closed neeleshdodda44 closed 6 years ago

neeleshdodda44 commented 6 years ago

Two major changes:

  1. Labeler is now an integer. So the _create_dataset_iterator method in iterators.py only accepts integers in the labeler argument. I changed the tests for runner so that the make_train_job and make_validate_job methods made jobs whose labelers were 0.

  2. Cleaned up the `_create_dataset_iterator' so that it uses dataframes in processing the data. Hopefully it's a bit easier to read now?

Suggestions:

  1. Would it be easier if the labeler was actually a string (that denotes the column name of the data)? Not trying to nitpick, but wouldn't you have to know the column name before you label the data? It seems like it would make everything a lot more readable and debugging would be smoother. The code change on my end would be minimal.

Hopefully this is pretty straightforward and this can get merged quickly.

neeleshdodda44 commented 6 years ago

@georgymh Incorporated the changes you asked for. Couple things:

If you can modify the 2 MNIST CSVs accordingly

Since you didn't explicitly tell me more information, I assumed we were talking about the MNIST dataset, in which case I followed the schema here: https://www.kaggle.com/c/digit-recognizer/data Let me know if this is not the case.

Also, where in the wiki would you like me to make the note about the CSVs?

georgymh commented 6 years ago

Yes, that's perfect. Feel free to "squash and merge" whenever you're ready.

For the wiki, you can put it under the "Dataset" section in Software Engineering > Products > Data Provider Unix Service.