HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
514 stars 129 forks source link

Replace print statements with logging #25

Closed richardwu closed 5 years ago

richardwu commented 5 years ago

Waiting on #23 to be merged. Review last 2 commits.

Closes #19

Example output

Launching test...
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE Loading hospital.csv
DEBUG:root:Time to load dataset: 0.29 secs
DEBUG:root:OPENED constraints file successfully
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.Condition,t2.Condition)&EQ(t1.MeasureName,t2.MeasureName)&IQ(t1.HospitalType,t2.HospitalType)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.Condition,t2.Condition)&EQ(t1.MeasureName,t2.MeasureName)&IQ(t1.HospitalType,t2.HospitalType)
INFO:root:DONE parsing predicate: EQ(t1.Condition,t2.Condition)
INFO:root:DONE parsing predicate: EQ(t1.MeasureName,t2.MeasureName)
INFO:root:DONE parsing predicate: IQ(t1.HospitalType,t2.HospitalType)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.ZipCode,t2.ZipCode)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.ZipCode,t2.ZipCode)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.ZipCode,t2.ZipCode)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.Sample,t2.Sample)&IQ(t1.Score,t2.Score)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.Sample,t2.Sample)&IQ(t1.Score,t2.Score)
INFO:root:DONE parsing predicate: EQ(t1.Sample,t2.Sample)
INFO:root:DONE parsing predicate: IQ(t1.Score,t2.Score)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.PhoneNumber,t2.PhoneNumber)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.PhoneNumber,t2.PhoneNumber)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.PhoneNumber,t2.PhoneNumber)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.MeasureName,t2.MeasureName)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.MeasureName,t2.MeasureName)
INFO:root:DONE parsing predicate: EQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE parsing predicate: IQ(t1.MeasureName,t2.MeasureName)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.Stateavg,t2.Stateavg)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.Stateavg,t2.Stateavg)
INFO:root:DONE parsing predicate: EQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE parsing predicate: IQ(t1.Stateavg,t2.Stateavg)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.ProviderNumber,t2.ProviderNumber)&IQ(t1.HospitalName,t2.HospitalName)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.ProviderNumber,t2.ProviderNumber)&IQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: EQ(t1.ProviderNumber,t2.ProviderNumber)
INFO:root:DONE parsing predicate: IQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.Condition,t2.Condition)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.Condition,t2.Condition)
INFO:root:DONE parsing predicate: EQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE parsing predicate: IQ(t1.Condition,t2.Condition)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.Address1,t2.Address1)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.Address1,t2.Address1)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.Address1,t2.Address1)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.HospitalOwner,t2.HospitalOwner)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.HospitalOwner,t2.HospitalOwner)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.HospitalOwner,t2.HospitalOwner)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.ProviderNumber,t2.ProviderNumber)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.ProviderNumber,t2.ProviderNumber)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.ProviderNumber,t2.ProviderNumber)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&EQ(t1.PhoneNumber,t2.PhoneNumber)&EQ(t1.HospitalOwner,t2.HospitalOwner)&IQ(t1.State,t2.State)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&EQ(t1.PhoneNumber,t2.PhoneNumber)&EQ(t1.HospitalOwner,t2.HospitalOwner)&IQ(t1.State,t2.State)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: EQ(t1.PhoneNumber,t2.PhoneNumber)
INFO:root:DONE parsing predicate: EQ(t1.HospitalOwner,t2.HospitalOwner)
INFO:root:DONE parsing predicate: IQ(t1.State,t2.State)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.City,t2.City)&IQ(t1.CountyName,t2.CountyName)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.City,t2.City)&IQ(t1.CountyName,t2.CountyName)
INFO:root:DONE parsing predicate: EQ(t1.City,t2.City)
INFO:root:DONE parsing predicate: IQ(t1.CountyName,t2.CountyName)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.ZipCode,t2.ZipCode)&IQ(t1.EmergencyService,t2.EmergencyService)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.ZipCode,t2.ZipCode)&IQ(t1.EmergencyService,t2.EmergencyService)
INFO:root:DONE parsing predicate: EQ(t1.ZipCode,t2.ZipCode)
INFO:root:DONE parsing predicate: IQ(t1.EmergencyService,t2.EmergencyService)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.City,t2.City)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.City,t2.City)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.City,t2.City)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.MeasureName,t2.MeasureName)&IQ(t1.MeasureCode,t2.MeasureCode)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.MeasureName,t2.MeasureName)&IQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE parsing predicate: EQ(t1.MeasureName,t2.MeasureName)
INFO:root:DONE parsing predicate: IQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE Loading DCs from hospital_constraints_att.txt
DEBUG:root:Time to load dirty data: 0.02 secs
DEBUG:root:DONE with Error Detector: NullDetector in 0.08 secs
DEBUG:root:Preparing to execute 16 queries.
DEBUG:root:Time to execute 16 queries: 0.01 secs
DEBUG:root:DONE with Error Detector: ViolationDetector in 0.12 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE with error detection.
DEBUG:root:Time to detect errors: 1.58 secs
DEBUG:root:Time to execute query: 0.00 secs
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 21.54it/s]
DEBUG:root:DONE with pair stats preparation in 0.81 secs
 51%|██████████████████████████████████████████████████████████████▋                                                             | 506/1000 [00:07<00:07, 67.72it/s]100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:23<00:00, 42.62it/s]
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create table: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE with domain preparation.
DEBUG:root:Time to setup the domain: 30.77 secs
DEBUG:root:Time to execute query: 0.00 secs
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 55.26it/s]
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 72 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 72 unique words (100% of original 72, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 72 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 36 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 257 word corpus (25.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 72 words and 10 dimensions: 41760 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 1409
INFO:gensim.models.word2vec:training model with 3 workers on 72 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1218 effective words) took 0.0s, 50030 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 74 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 74 unique words (100% of original 74, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 74 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 28 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 233 word corpus (23.4% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 74 words and 10 dimensions: 42920 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 685
INFO:gensim.models.word2vec:training model with 3 workers on 74 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1135 effective words) took 0.0s, 52050 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 76 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 76 unique words (100% of original 76, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 76 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 42 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 277 word corpus (27.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 76 words and 10 dimensions: 44080 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 3176
INFO:gensim.models.word2vec:training model with 3 workers on 76 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1323 effective words) took 0.0s, 30311 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 1 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 1 unique words (100% of original 1, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 1 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 1 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 32 word corpus (3.3% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 1 words and 10 dimensions: 580 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 14
INFO:gensim.models.word2vec:training model with 3 workers on 1 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (136 effective words) took 0.0s, 6881 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 1 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 1 unique words (100% of original 1, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 1 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 1 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 32 word corpus (3.3% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 1 words and 10 dimensions: 580 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 14
INFO:gensim.models.word2vec:training model with 3 workers on 1 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (136 effective words) took 0.0s, 14180 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 71 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 71 unique words (100% of original 71, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 71 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 41 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 273 word corpus (27.3% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 71 words and 10 dimensions: 41180 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 747
INFO:gensim.models.word2vec:training model with 3 workers on 71 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1309 effective words) took 0.0s, 42736 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 27 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 27 unique words (100% of original 27, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 27 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 10 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 116 word corpus (11.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 27 words and 10 dimensions: 15660 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 1097
INFO:gensim.models.word2vec:training model with 3 workers on 27 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (556 effective words) took 0.0s, 24812 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 334 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 334 unique words (100% of original 334, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 334 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 103 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 635 word corpus (63.6% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 334 words and 10 dimensions: 193720 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 2543
INFO:gensim.models.word2vec:training model with 3 workers on 334 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (3145 effective words) took 0.0s, 72404 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 4 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 4 unique words (100% of original 4, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 4 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 4 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 46 word corpus (4.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 4 words and 10 dimensions: 2320 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 12
INFO:gensim.models.word2vec:training model with 3 workers on 4 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (208 effective words) took 0.0s, 12238 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 69 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 69 unique words (100% of original 69, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 69 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 39 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 235 word corpus (23.6% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 69 words and 10 dimensions: 40020 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 367
INFO:gensim.models.word2vec:training model with 3 workers on 69 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1120 effective words) took 0.0s, 71521 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 74 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 74 unique words (100% of original 74, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 74 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 43 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 280 word corpus (28.0% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 74 words and 10 dimensions: 42920 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 1758
INFO:gensim.models.word2vec:training model with 3 workers on 74 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1326 effective words) took 0.0s, 96627 effective words/s
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 28 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 28 unique words (100% of original 28, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 28 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 7 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 100 word corpus (10.1% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 28 words and 10 dimensions: 16240 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 806
INFO:gensim.models.word2vec:training model with 3 workers on 28 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:training on 5000 raw words (469 effective words) took 0.0s, 23521 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 72 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 72 unique words (100% of original 72, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 72 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 42 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 275 word corpus (27.5% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 72 words and 10 dimensions: 41760 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 599
INFO:gensim.models.word2vec:training model with 3 workers on 72 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1313 effective words) took 0.0s, 67638 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 65 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 65 unique words (100% of original 65, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 65 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 31 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 241 word corpus (24.2% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 65 words and 10 dimensions: 37700 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 1131
INFO:gensim.models.word2vec:training model with 3 workers on 65 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1160 effective words) took 0.0s, 55561 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 6 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 6 unique words (100% of original 6, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 6 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 5 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 56 word corpus (5.6% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 6 words and 10 dimensions: 3480 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 28
INFO:gensim.models.word2vec:training model with 3 workers on 6 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (251 effective words) took 0.0s, 12346 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 64 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 64 unique words (100% of original 64, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 64 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 27 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 223 word corpus (22.3% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 64 words and 10 dimensions: 37120 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 7296
INFO:gensim.models.word2vec:training model with 3 workers on 64 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1081 effective words) took 0.1s, 10746 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 69 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 69 unique words (100% of original 69, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 69 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 42 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 271 word corpus (27.2% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 69 words and 10 dimensions: 40020 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 2627
INFO:gensim.models.word2vec:training model with 3 workers on 69 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1293 effective words) took 0.0s, 32600 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 54 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 54 unique words (100% of original 54, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 54 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 27 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 216 word corpus (21.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 54 words and 10 dimensions: 31320 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 555
INFO:gensim.models.word2vec:training model with 3 workers on 54 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1045 effective words) took 0.0s, 53292 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 13 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 13 unique words (100% of original 13, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 13 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 4 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 54 word corpus (5.5% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 13 words and 10 dimensions: 7540 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 347
INFO:gensim.models.word2vec:training model with 3 workers on 13 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:training on 5000 raw words (242 effective words) took 0.0s, 11416 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:root:Time to execute query: 0.13 secs
DEBUG:root:Time to execute query: 0.03 secs
DEBUG:root:Time to execute query: 0.02 secs
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14163/14163 [00:14<00:00, 992.35it/s]
DEBUG:root:Time to execute query: 0.07 secs
/Users/rwu1997/anaconda3/envs/holo_dev27/lib/python2.7/site-packages/gensim/models/wrappers/fasttext.py:104: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  ngrams = [ng for ng in ngrams if ng in self.ngrams]
DEBUG:root:Preparing to execute 35 queries.
DEBUG:root:Time to execute 35 queries: 0.18 secs
DEBUG:root:Generating weak labels.
DEBUG:root:Time to execute query: 0.00 secs
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3955/3955 [00:00<00:00, 90156.70it/s]
DEBUG:root:DONE generating weak labels.
DEBUG:root:Generating mask.
DEBUG:root:Time to execute query: 0.01 secs
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14163/14163 [00:00<00:00, 48946.30it/s]
DEBUG:root:DONE generating mask.
INFO:root:DONE setting up featurized dataset.
DEBUG:root:Time to featurize data: 26.76 secs
INFO:root:DONE setting up repair model.
DEBUG:root:Time to setup repair model: 26.76 secs
  0%|                                                                                                                                        | 0/30 [00:00<?, ?it/s]DEBUG:root:Epoch 1, cost = 0.231135, acc = 99.52%
  3%|████▎                                                                                                                           | 1/30 [00:01<00:29,  1.02s/it]DEBUG:root:Epoch 2, cost = 0.053873, acc = 99.65%
  7%|████████▌                                                                                                                       | 2/30 [00:02<00:28,  1.01s/it]DEBUG:root:Epoch 3, cost = 0.044740, acc = 99.67%
 10%|████████████▊                                                                                                                   | 3/30 [00:03<00:27,  1.03s/it]DEBUG:root:Epoch 4, cost = 0.043239, acc = 99.67%
 13%|█████████████████                                                                                                               | 4/30 [00:04<00:26,  1.04s/it]DEBUG:root:Epoch 5, cost = 0.042934, acc = 99.67%
 17%|█████████████████████▎                                                                                                          | 5/30 [00:05<00:26,  1.04s/it]DEBUG:root:Epoch 6, cost = 0.042874, acc = 99.67%
 20%|█████████████████████████▌                                                                                                      | 6/30 [00:06<00:24,  1.04s/it]DEBUG:root:Epoch 7, cost = 0.042866, acc = 99.67%
 23%|█████████████████████████████▊                                                                                                  | 7/30 [00:07<00:23,  1.03s/it]DEBUG:root:Epoch 8, cost = 0.042868, acc = 99.67%
 27%|██████████████████████████████████▏                                                                                             | 8/30 [00:08<00:22,  1.03s/it]DEBUG:root:Epoch 9, cost = 0.042871, acc = 99.67%
 30%|██████████████████████████████████████▍                                                                                         | 9/30 [00:09<00:21,  1.03s/it]DEBUG:root:Epoch 10, cost = 0.042873, acc = 99.67%
 33%|██████████████████████████████████████████▎                                                                                    | 10/30 [00:10<00:20,  1.02s/it]DEBUG:root:Epoch 11, cost = 0.042874, acc = 99.67%
 37%|██████████████████████████████████████████████▌                                                                                | 11/30 [00:11<00:19,  1.04s/it]DEBUG:root:Epoch 12, cost = 0.042875, acc = 99.67%
 40%|██████████████████████████████████████████████████▊                                                                            | 12/30 [00:12<00:18,  1.04s/it]DEBUG:root:Epoch 13, cost = 0.042876, acc = 99.67%
 43%|███████████████████████████████████████████████████████                                                                        | 13/30 [00:13<00:17,  1.03s/it]DEBUG:root:Epoch 14, cost = 0.042876, acc = 99.67%
 47%|███████████████████████████████████████████████████████████▎                                                                   | 14/30 [00:14<00:16,  1.03s/it]DEBUG:root:Epoch 15, cost = 0.042876, acc = 99.67%
 50%|███████████████████████████████████████████████████████████████▌                                                               | 15/30 [00:15<00:15,  1.02s/it]DEBUG:root:Epoch 16, cost = 0.042876, acc = 99.67%
 53%|███████████████████████████████████████████████████████████████████▋                                                           | 16/30 [00:16<00:14,  1.02s/it]DEBUG:root:Epoch 17, cost = 0.042876, acc = 99.67%
 57%|███████████████████████████████████████████████████████████████████████▉                                                       | 17/30 [00:17<00:13,  1.02s/it]DEBUG:root:Epoch 18, cost = 0.042876, acc = 99.67%
 60%|████████████████████████████████████████████████████████████████████████████▏                                                  | 18/30 [00:18<00:12,  1.01s/it]DEBUG:root:Epoch 19, cost = 0.042876, acc = 99.67%
 63%|████████████████████████████████████████████████████████████████████████████████▍                                              | 19/30 [00:19<00:11,  1.01s/it]DEBUG:root:Epoch 20, cost = 0.042876, acc = 99.67%
 67%|████████████████████████████████████████████████████████████████████████████████████▋                                          | 20/30 [00:20<00:10,  1.01s/it]DEBUG:root:Epoch 21, cost = 0.042876, acc = 99.67%
 70%|████████████████████████████████████████████████████████████████████████████████████████▉                                      | 21/30 [00:21<00:09,  1.01s/it]DEBUG:root:Epoch 22, cost = 0.042876, acc = 99.67%
 73%|█████████████████████████████████████████████████████████████████████████████████████████████▏                                 | 22/30 [00:22<00:08,  1.01s/it]DEBUG:root:Epoch 23, cost = 0.042876, acc = 99.67%
 77%|█████████████████████████████████████████████████████████████████████████████████████████████████▎                             | 23/30 [00:23<00:07,  1.01s/it]DEBUG:root:Epoch 24, cost = 0.042876, acc = 99.67%
 80%|█████████████████████████████████████████████████████████████████████████████████████████████████████▌                         | 24/30 [00:24<00:06,  1.00s/it]DEBUG:root:Epoch 25, cost = 0.042876, acc = 99.67%
 83%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▊                     | 25/30 [00:25<00:05,  1.00s/it]DEBUG:root:Epoch 26, cost = 0.042876, acc = 99.67%
 87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████                 | 26/30 [00:26<00:04,  1.00s/it]DEBUG:root:Epoch 27, cost = 0.042876, acc = 99.67%
 90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎            | 27/30 [00:27<00:03,  1.00s/it]DEBUG:root:Epoch 28, cost = 0.042876, acc = 99.67%
 93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌        | 28/30 [00:27<00:01,  1.00it/s]DEBUG:root:Epoch 29, cost = 0.042876, acc = 99.67%
 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊    | 29/30 [00:29<00:01,  1.00s/it]DEBUG:root:Epoch 30, cost = 0.042876, acc = 99.67%
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:30<00:00,  1.00s/it]
INFO:root:DONE training repair model.
DEBUG:root:Time to fit repair model: 30.65 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE inferring repairs.
DEBUG:root:Time to infer correct cell values: 2.41 secs
DEBUG:root:Time to create table: 0.01 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE colleting the inferred values.
DEBUG:root:Time to collect inferred values: 0.10 secs
INFO:root:DONE generating repaired dataset
DEBUG:root:Time to store repaired dataset: 0.32 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE Loading hospital_clean.csv
DEBUG:root:Time to evaluate repairs: 3.48 secs
DEBUG:root:Time to execute query: 0.00 secs
DEBUG:root:Time to execute query: 0.00 secs
DEBUG:root:Preparing to execute 19 queries.
DEBUG:root:Time to execute 19 queries: 0.01 secs
DEBUG:root:Time to execute query: 0.00 secs
DEBUG:root:Preparing to execute 19 queries.
DEBUG:root:Time to execute 19 queries: 0.01 secs
INFO:root:Precision = 0.92, Recall = 0.69, Repairing Recall = 0.80, F1 = 0.79, Repairing F1 = 0.86, Detected Errors = 437, Total Errors = 509, Correct Repairs = 351, Total Repairs = 380, Total Repairs (Grdth present) = 380
DEBUG:root:Time to generate report: 0.02 secs
thodrek commented 5 years ago

Merge this?

richardwu commented 5 years ago

Rebased, ready for merge.