dsp-uga / andromeda

This repository contains a Naive Bayes classifier implemented on document classification which is completed on CSCI 8360, Data Science Practicum at the University of Georgia, Spring 2018.
MIT License
4 stars 1 forks source link

data structure to form #15

Closed WeiwenXu21 closed 6 years ago

WeiwenXu21 commented 6 years ago

RDD([ ('label1', [('w1',c1), ('w2',c2), ('w3',c3), ...]), ('label2', [('w1',c1), ('w2',c2), ('w3',c3), ...]), ...]) ----> This one is good for Naive Bayes but hard to get into [('w1',c1), ('w2',c2), ('w3',c3), ...] efficiently or RDD([ ('w1', [('label1',c1), ('label2',c2), ('label3',c3), ...]), ('w2', [('label1',c1), ('label2',c2), ('label3',c3), ...]), ...]) or RDD([ ('label1', [c1, c2, c3, ...]), ('label2', [c1, c2, c3, ...]), ...]) or RDD([ ('w1', [c1, c2, c3, ...]), ('w2', [c1, c2, c3, ...]), ...])

melanieihuei commented 6 years ago

The first one will be best for detecting the words in testing! Please!

WeiwenXu21 commented 6 years ago

sorted!