Waiting on #23 to be merged. Review last 2 commits.
Closes #19
Example output
Launching test...
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE Loading hospital.csv
DEBUG:root:Time to load dataset: 0.29 secs
DEBUG:root:OPENED constraints file successfully
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.Condition,t2.Condition)&EQ(t1.MeasureName,t2.MeasureName)&IQ(t1.HospitalType,t2.HospitalType)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.Condition,t2.Condition)&EQ(t1.MeasureName,t2.MeasureName)&IQ(t1.HospitalType,t2.HospitalType)
INFO:root:DONE parsing predicate: EQ(t1.Condition,t2.Condition)
INFO:root:DONE parsing predicate: EQ(t1.MeasureName,t2.MeasureName)
INFO:root:DONE parsing predicate: IQ(t1.HospitalType,t2.HospitalType)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.ZipCode,t2.ZipCode)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.ZipCode,t2.ZipCode)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.ZipCode,t2.ZipCode)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.Sample,t2.Sample)&IQ(t1.Score,t2.Score)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.Sample,t2.Sample)&IQ(t1.Score,t2.Score)
INFO:root:DONE parsing predicate: EQ(t1.Sample,t2.Sample)
INFO:root:DONE parsing predicate: IQ(t1.Score,t2.Score)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.PhoneNumber,t2.PhoneNumber)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.PhoneNumber,t2.PhoneNumber)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.PhoneNumber,t2.PhoneNumber)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.MeasureName,t2.MeasureName)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.MeasureName,t2.MeasureName)
INFO:root:DONE parsing predicate: EQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE parsing predicate: IQ(t1.MeasureName,t2.MeasureName)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.Stateavg,t2.Stateavg)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.Stateavg,t2.Stateavg)
INFO:root:DONE parsing predicate: EQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE parsing predicate: IQ(t1.Stateavg,t2.Stateavg)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.ProviderNumber,t2.ProviderNumber)&IQ(t1.HospitalName,t2.HospitalName)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.ProviderNumber,t2.ProviderNumber)&IQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: EQ(t1.ProviderNumber,t2.ProviderNumber)
INFO:root:DONE parsing predicate: IQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.Condition,t2.Condition)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.MeasureCode,t2.MeasureCode)&IQ(t1.Condition,t2.Condition)
INFO:root:DONE parsing predicate: EQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE parsing predicate: IQ(t1.Condition,t2.Condition)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.Address1,t2.Address1)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.Address1,t2.Address1)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.Address1,t2.Address1)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.HospitalOwner,t2.HospitalOwner)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.HospitalOwner,t2.HospitalOwner)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.HospitalOwner,t2.HospitalOwner)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.ProviderNumber,t2.ProviderNumber)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.ProviderNumber,t2.ProviderNumber)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.ProviderNumber,t2.ProviderNumber)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&EQ(t1.PhoneNumber,t2.PhoneNumber)&EQ(t1.HospitalOwner,t2.HospitalOwner)&IQ(t1.State,t2.State)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&EQ(t1.PhoneNumber,t2.PhoneNumber)&EQ(t1.HospitalOwner,t2.HospitalOwner)&IQ(t1.State,t2.State)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: EQ(t1.PhoneNumber,t2.PhoneNumber)
INFO:root:DONE parsing predicate: EQ(t1.HospitalOwner,t2.HospitalOwner)
INFO:root:DONE parsing predicate: IQ(t1.State,t2.State)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.City,t2.City)&IQ(t1.CountyName,t2.CountyName)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.City,t2.City)&IQ(t1.CountyName,t2.CountyName)
INFO:root:DONE parsing predicate: EQ(t1.City,t2.City)
INFO:root:DONE parsing predicate: IQ(t1.CountyName,t2.CountyName)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.ZipCode,t2.ZipCode)&IQ(t1.EmergencyService,t2.EmergencyService)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.ZipCode,t2.ZipCode)&IQ(t1.EmergencyService,t2.EmergencyService)
INFO:root:DONE parsing predicate: EQ(t1.ZipCode,t2.ZipCode)
INFO:root:DONE parsing predicate: IQ(t1.EmergencyService,t2.EmergencyService)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.City,t2.City)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.HospitalName,t2.HospitalName)&IQ(t1.City,t2.City)
INFO:root:DONE parsing predicate: EQ(t1.HospitalName,t2.HospitalName)
INFO:root:DONE parsing predicate: IQ(t1.City,t2.City)
INFO:root:DONE pre-processing constraint: t1&t2&EQ(t1.MeasureName,t2.MeasureName)&IQ(t1.MeasureCode,t2.MeasureCode)
DEBUG:root:DONE extracting tuples from constraint: t1&t2&EQ(t1.MeasureName,t2.MeasureName)&IQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE parsing predicate: EQ(t1.MeasureName,t2.MeasureName)
INFO:root:DONE parsing predicate: IQ(t1.MeasureCode,t2.MeasureCode)
INFO:root:DONE Loading DCs from hospital_constraints_att.txt
DEBUG:root:Time to load dirty data: 0.02 secs
DEBUG:root:DONE with Error Detector: NullDetector in 0.08 secs
DEBUG:root:Preparing to execute 16 queries.
DEBUG:root:Time to execute 16 queries: 0.01 secs
DEBUG:root:DONE with Error Detector: ViolationDetector in 0.12 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE with error detection.
DEBUG:root:Time to detect errors: 1.58 secs
DEBUG:root:Time to execute query: 0.00 secs
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 21.54it/s]
DEBUG:root:DONE with pair stats preparation in 0.81 secs
51%|██████████████████████████████████████████████████████████████▋ | 506/1000 [00:07<00:07, 67.72it/s]100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:23<00:00, 42.62it/s]
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create table: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE with domain preparation.
DEBUG:root:Time to setup the domain: 30.77 secs
DEBUG:root:Time to execute query: 0.00 secs
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 55.26it/s]
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 72 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 72 unique words (100% of original 72, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 72 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 36 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 257 word corpus (25.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 72 words and 10 dimensions: 41760 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 1409
INFO:gensim.models.word2vec:training model with 3 workers on 72 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1218 effective words) took 0.0s, 50030 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 74 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 74 unique words (100% of original 74, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 74 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 28 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 233 word corpus (23.4% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 74 words and 10 dimensions: 42920 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 685
INFO:gensim.models.word2vec:training model with 3 workers on 74 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1135 effective words) took 0.0s, 52050 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 76 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 76 unique words (100% of original 76, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 76 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 42 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 277 word corpus (27.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 76 words and 10 dimensions: 44080 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 3176
INFO:gensim.models.word2vec:training model with 3 workers on 76 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1323 effective words) took 0.0s, 30311 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 1 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 1 unique words (100% of original 1, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 1 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 1 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 32 word corpus (3.3% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 1 words and 10 dimensions: 580 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 14
INFO:gensim.models.word2vec:training model with 3 workers on 1 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (136 effective words) took 0.0s, 6881 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 1 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 1 unique words (100% of original 1, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 1 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 1 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 32 word corpus (3.3% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 1 words and 10 dimensions: 580 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 14
INFO:gensim.models.word2vec:training model with 3 workers on 1 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (136 effective words) took 0.0s, 14180 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 71 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 71 unique words (100% of original 71, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 71 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 41 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 273 word corpus (27.3% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 71 words and 10 dimensions: 41180 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 747
INFO:gensim.models.word2vec:training model with 3 workers on 71 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1309 effective words) took 0.0s, 42736 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 27 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 27 unique words (100% of original 27, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 27 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 10 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 116 word corpus (11.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 27 words and 10 dimensions: 15660 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 1097
INFO:gensim.models.word2vec:training model with 3 workers on 27 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (556 effective words) took 0.0s, 24812 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 334 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 334 unique words (100% of original 334, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 334 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 103 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 635 word corpus (63.6% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 334 words and 10 dimensions: 193720 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 2543
INFO:gensim.models.word2vec:training model with 3 workers on 334 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (3145 effective words) took 0.0s, 72404 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 4 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 4 unique words (100% of original 4, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 4 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 4 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 46 word corpus (4.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 4 words and 10 dimensions: 2320 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 12
INFO:gensim.models.word2vec:training model with 3 workers on 4 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (208 effective words) took 0.0s, 12238 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 69 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 69 unique words (100% of original 69, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 69 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 39 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 235 word corpus (23.6% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 69 words and 10 dimensions: 40020 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 367
INFO:gensim.models.word2vec:training model with 3 workers on 69 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1120 effective words) took 0.0s, 71521 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 74 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 74 unique words (100% of original 74, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 74 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 43 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 280 word corpus (28.0% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 74 words and 10 dimensions: 42920 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 1758
INFO:gensim.models.word2vec:training model with 3 workers on 74 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1326 effective words) took 0.0s, 96627 effective words/s
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 28 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 28 unique words (100% of original 28, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 28 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 7 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 100 word corpus (10.1% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 28 words and 10 dimensions: 16240 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 806
INFO:gensim.models.word2vec:training model with 3 workers on 28 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:training on 5000 raw words (469 effective words) took 0.0s, 23521 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 72 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 72 unique words (100% of original 72, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 72 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 42 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 275 word corpus (27.5% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 72 words and 10 dimensions: 41760 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 599
INFO:gensim.models.word2vec:training model with 3 workers on 72 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1313 effective words) took 0.0s, 67638 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 65 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 65 unique words (100% of original 65, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 65 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 31 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 241 word corpus (24.2% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 65 words and 10 dimensions: 37700 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 1131
INFO:gensim.models.word2vec:training model with 3 workers on 65 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1160 effective words) took 0.0s, 55561 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 6 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 6 unique words (100% of original 6, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 6 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 5 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 56 word corpus (5.6% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 6 words and 10 dimensions: 3480 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 28
INFO:gensim.models.word2vec:training model with 3 workers on 6 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (251 effective words) took 0.0s, 12346 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 64 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 64 unique words (100% of original 64, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 64 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 27 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 223 word corpus (22.3% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 64 words and 10 dimensions: 37120 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 7296
INFO:gensim.models.word2vec:training model with 3 workers on 64 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1081 effective words) took 0.1s, 10746 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 69 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 69 unique words (100% of original 69, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 69 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 42 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 271 word corpus (27.2% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 69 words and 10 dimensions: 40020 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 2627
INFO:gensim.models.word2vec:training model with 3 workers on 69 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1293 effective words) took 0.0s, 32600 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 54 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 54 unique words (100% of original 54, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 54 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 27 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 216 word corpus (21.7% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 54 words and 10 dimensions: 31320 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 555
INFO:gensim.models.word2vec:training model with 3 workers on 54 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
INFO:gensim.models.word2vec:training on 5000 raw words (1045 effective words) took 0.0s, 53292 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:gensim.models.word2vec:Fast version of gensim.models.word2vec is being used
WARNING:gensim.models.word2vec:consider setting layer size to a multiple of 4 for greater performance
INFO:gensim.models.word2vec:collecting all words and their counts
INFO:gensim.models.word2vec:PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO:gensim.models.word2vec:collected 13 word types from a corpus of 1000 raw words and 1000 sentences
INFO:gensim.models.word2vec:Loading a fresh vocabulary
INFO:gensim.models.word2vec:min_count=1 retains 13 unique words (100% of original 13, drops 0)
INFO:gensim.models.word2vec:min_count=1 leaves 1000 word corpus (100% of original 1000, drops 0)
INFO:gensim.models.word2vec:deleting the raw counts dictionary of 13 items
INFO:gensim.models.word2vec:sample=0.001 downsamples 4 most-common words
INFO:gensim.models.word2vec:downsampling leaves estimated 54 word corpus (5.5% of prior 1000)
INFO:gensim.models.word2vec:estimated required memory for 13 words and 10 dimensions: 7540 bytes
INFO:gensim.models.word2vec:resetting layer weights
INFO:gensim.models.fasttext:Total number of ngrams is 347
INFO:gensim.models.word2vec:training model with 3 workers on 13 vocabulary and 10 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
DEBUG:gensim.models.word2vec:queueing job #0 (5000 words, 5000 sentences) at alpha 0.02500
DEBUG:gensim.models.word2vec:job loop exiting, total 1 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
DEBUG:gensim.models.word2vec:worker exiting, processed 0 jobs
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 2 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 1 more threads
INFO:gensim.models.word2vec:worker thread finished; awaiting finish of 0 more threads
DEBUG:gensim.models.word2vec:worker exiting, processed 1 jobs
INFO:gensim.models.word2vec:training on 5000 raw words (242 effective words) took 0.0s, 11416 effective words/s
WARNING:gensim.models.word2vec:under 10 jobs per worker: consider setting a smaller `batch_words' for smoother alpha decay
DEBUG:root:Time to execute query: 0.13 secs
DEBUG:root:Time to execute query: 0.03 secs
DEBUG:root:Time to execute query: 0.02 secs
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14163/14163 [00:14<00:00, 992.35it/s]
DEBUG:root:Time to execute query: 0.07 secs
/Users/rwu1997/anaconda3/envs/holo_dev27/lib/python2.7/site-packages/gensim/models/wrappers/fasttext.py:104: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
ngrams = [ng for ng in ngrams if ng in self.ngrams]
DEBUG:root:Preparing to execute 35 queries.
DEBUG:root:Time to execute 35 queries: 0.18 secs
DEBUG:root:Generating weak labels.
DEBUG:root:Time to execute query: 0.00 secs
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3955/3955 [00:00<00:00, 90156.70it/s]
DEBUG:root:DONE generating weak labels.
DEBUG:root:Generating mask.
DEBUG:root:Time to execute query: 0.01 secs
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14163/14163 [00:00<00:00, 48946.30it/s]
DEBUG:root:DONE generating mask.
INFO:root:DONE setting up featurized dataset.
DEBUG:root:Time to featurize data: 26.76 secs
INFO:root:DONE setting up repair model.
DEBUG:root:Time to setup repair model: 26.76 secs
0%| | 0/30 [00:00<?, ?it/s]DEBUG:root:Epoch 1, cost = 0.231135, acc = 99.52%
3%|████▎ | 1/30 [00:01<00:29, 1.02s/it]DEBUG:root:Epoch 2, cost = 0.053873, acc = 99.65%
7%|████████▌ | 2/30 [00:02<00:28, 1.01s/it]DEBUG:root:Epoch 3, cost = 0.044740, acc = 99.67%
10%|████████████▊ | 3/30 [00:03<00:27, 1.03s/it]DEBUG:root:Epoch 4, cost = 0.043239, acc = 99.67%
13%|█████████████████ | 4/30 [00:04<00:26, 1.04s/it]DEBUG:root:Epoch 5, cost = 0.042934, acc = 99.67%
17%|█████████████████████▎ | 5/30 [00:05<00:26, 1.04s/it]DEBUG:root:Epoch 6, cost = 0.042874, acc = 99.67%
20%|█████████████████████████▌ | 6/30 [00:06<00:24, 1.04s/it]DEBUG:root:Epoch 7, cost = 0.042866, acc = 99.67%
23%|█████████████████████████████▊ | 7/30 [00:07<00:23, 1.03s/it]DEBUG:root:Epoch 8, cost = 0.042868, acc = 99.67%
27%|██████████████████████████████████▏ | 8/30 [00:08<00:22, 1.03s/it]DEBUG:root:Epoch 9, cost = 0.042871, acc = 99.67%
30%|██████████████████████████████████████▍ | 9/30 [00:09<00:21, 1.03s/it]DEBUG:root:Epoch 10, cost = 0.042873, acc = 99.67%
33%|██████████████████████████████████████████▎ | 10/30 [00:10<00:20, 1.02s/it]DEBUG:root:Epoch 11, cost = 0.042874, acc = 99.67%
37%|██████████████████████████████████████████████▌ | 11/30 [00:11<00:19, 1.04s/it]DEBUG:root:Epoch 12, cost = 0.042875, acc = 99.67%
40%|██████████████████████████████████████████████████▊ | 12/30 [00:12<00:18, 1.04s/it]DEBUG:root:Epoch 13, cost = 0.042876, acc = 99.67%
43%|███████████████████████████████████████████████████████ | 13/30 [00:13<00:17, 1.03s/it]DEBUG:root:Epoch 14, cost = 0.042876, acc = 99.67%
47%|███████████████████████████████████████████████████████████▎ | 14/30 [00:14<00:16, 1.03s/it]DEBUG:root:Epoch 15, cost = 0.042876, acc = 99.67%
50%|███████████████████████████████████████████████████████████████▌ | 15/30 [00:15<00:15, 1.02s/it]DEBUG:root:Epoch 16, cost = 0.042876, acc = 99.67%
53%|███████████████████████████████████████████████████████████████████▋ | 16/30 [00:16<00:14, 1.02s/it]DEBUG:root:Epoch 17, cost = 0.042876, acc = 99.67%
57%|███████████████████████████████████████████████████████████████████████▉ | 17/30 [00:17<00:13, 1.02s/it]DEBUG:root:Epoch 18, cost = 0.042876, acc = 99.67%
60%|████████████████████████████████████████████████████████████████████████████▏ | 18/30 [00:18<00:12, 1.01s/it]DEBUG:root:Epoch 19, cost = 0.042876, acc = 99.67%
63%|████████████████████████████████████████████████████████████████████████████████▍ | 19/30 [00:19<00:11, 1.01s/it]DEBUG:root:Epoch 20, cost = 0.042876, acc = 99.67%
67%|████████████████████████████████████████████████████████████████████████████████████▋ | 20/30 [00:20<00:10, 1.01s/it]DEBUG:root:Epoch 21, cost = 0.042876, acc = 99.67%
70%|████████████████████████████████████████████████████████████████████████████████████████▉ | 21/30 [00:21<00:09, 1.01s/it]DEBUG:root:Epoch 22, cost = 0.042876, acc = 99.67%
73%|█████████████████████████████████████████████████████████████████████████████████████████████▏ | 22/30 [00:22<00:08, 1.01s/it]DEBUG:root:Epoch 23, cost = 0.042876, acc = 99.67%
77%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 23/30 [00:23<00:07, 1.01s/it]DEBUG:root:Epoch 24, cost = 0.042876, acc = 99.67%
80%|█████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 24/30 [00:24<00:06, 1.00s/it]DEBUG:root:Epoch 25, cost = 0.042876, acc = 99.67%
83%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 25/30 [00:25<00:05, 1.00s/it]DEBUG:root:Epoch 26, cost = 0.042876, acc = 99.67%
87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 26/30 [00:26<00:04, 1.00s/it]DEBUG:root:Epoch 27, cost = 0.042876, acc = 99.67%
90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 27/30 [00:27<00:03, 1.00s/it]DEBUG:root:Epoch 28, cost = 0.042876, acc = 99.67%
93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 28/30 [00:27<00:01, 1.00it/s]DEBUG:root:Epoch 29, cost = 0.042876, acc = 99.67%
97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 29/30 [00:29<00:01, 1.00s/it]DEBUG:root:Epoch 30, cost = 0.042876, acc = 99.67%
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:30<00:00, 1.00s/it]
INFO:root:DONE training repair model.
DEBUG:root:Time to fit repair model: 30.65 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE inferring repairs.
DEBUG:root:Time to infer correct cell values: 2.41 secs
DEBUG:root:Time to create table: 0.01 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE colleting the inferred values.
DEBUG:root:Time to collect inferred values: 0.10 secs
INFO:root:DONE generating repaired dataset
DEBUG:root:Time to store repaired dataset: 0.32 secs
DEBUG:root:Time to create index: 0.00 secs
DEBUG:root:Time to create index: 0.00 secs
INFO:root:DONE Loading hospital_clean.csv
DEBUG:root:Time to evaluate repairs: 3.48 secs
DEBUG:root:Time to execute query: 0.00 secs
DEBUG:root:Time to execute query: 0.00 secs
DEBUG:root:Preparing to execute 19 queries.
DEBUG:root:Time to execute 19 queries: 0.01 secs
DEBUG:root:Time to execute query: 0.00 secs
DEBUG:root:Preparing to execute 19 queries.
DEBUG:root:Time to execute 19 queries: 0.01 secs
INFO:root:Precision = 0.92, Recall = 0.69, Repairing Recall = 0.80, F1 = 0.79, Repairing F1 = 0.86, Detected Errors = 437, Total Errors = 509, Correct Repairs = 351, Total Repairs = 380, Total Repairs (Grdth present) = 380
DEBUG:root:Time to generate report: 0.02 secs
Waiting on #23 to be merged. Review last 2 commits.Closes #19
Example output