issues
search
datalab-dev
/
quintessence_analysis
All the scripts we use for analysis
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
add topics.termstopicsdist collection
#67
avkoehl
closed
3 years ago
1
new data model for topics over time
#66
avkoehl
closed
3 years ago
2
add relevance metric from ldavis
#65
avkoehl
closed
3 years ago
5
update nearest neighbors to 30 in terms.neighbors
#64
avkoehl
closed
3 years ago
1
rework parse_lda to not need corpus object
#63
avkoehl
closed
3 years ago
0
add verbose argument to Parallel calls
#62
avkoehl
opened
3 years ago
0
test out modin library
#61
avkoehl
opened
3 years ago
0
normalize Id usage accross the tables
#60
avkoehl
closed
3 years ago
0
add terms.kwic collection
#59
avkoehl
opened
3 years ago
0
add terms.frequencies collection
#58
avkoehl
closed
3 years ago
0
Add wordcounts and decades column to metadata collection
#57
avkoehl
closed
3 years ago
1
add topTerms array to topics collection
#56
avkoehl
closed
3 years ago
0
restrict vocab
#55
avkoehl
closed
3 years ago
1
order docs based on word count before preprocessing
#54
avkoehl
opened
3 years ago
0
debug bow creation to use multiple cores
#53
avkoehl
opened
3 years ago
1
In compute subsets of embeddings use groups instead of indices
#52
avkoehl
closed
3 years ago
2
move text data out of the dataframes
#51
avkoehl
closed
3 years ago
0
save all dataframes as parquet files instead of csv
#50
avkoehl
opened
3 years ago
0
simplify outputs of topic model to pandas dataframes - doctopics, topicterms, termdocmatrix
#49
avkoehl
closed
3 years ago
0
simplify inputs to embeddings class and topic model class to just be a single dataframe
#48
avkoehl
closed
3 years ago
0
move embedding alignment out of nlp module into its own module
#47
avkoehl
closed
3 years ago
0
mv preprocessing for models directly into topicmodel and embeddings
#46
avkoehl
closed
3 years ago
0
add manually selected stop words for embeddings and topic model
#45
avkoehl
opened
3 years ago
0
mv parse_lda to class
#44
avkoehl
closed
3 years ago
0
move n_workers to global parameter (gets used in preprocessing and models)
#43
avkoehl
closed
3 years ago
0
run on full 60K documents corpus to find performance pain points
#42
avkoehl
closed
3 years ago
3
mv parse_embed to a class
#41
avkoehl
closed
3 years ago
1
Fix this
#40
avkoehl
closed
3 years ago
3
rework subsets to not include the decades. instead have subsets and decades
#39
avkoehl
closed
3 years ago
0
add code for creating all embeddings collections
#38
avkoehl
closed
3 years ago
0
add arg to tm_mallet training for silencing output
#37
avkoehl
closed
3 years ago
1
add method for creating output file name based on subset. Make sure spaces are replaced with'_'
#36
avkoehl
closed
3 years ago
0
move model parameters out of init and into the train function.
#35
avkoehl
closed
3 years ago
0
fix embeddings path handling to properly handle relative paths
#34
avkoehl
closed
3 years ago
0
write pipeline function for embedding
#33
avkoehl
closed
3 years ago
0
add code for loading in a set of embeddings models from files
#32
avkoehl
closed
3 years ago
0
write code for running all the embeddings subsets and saving to appropriate output
#31
avkoehl
closed
3 years ago
0
write pipeline function for topic model
#30
avkoehl
closed
3 years ago
0
fix topic model path handling to properly convert relative odir path to absolute odir path
#29
avkoehl
closed
3 years ago
0
move all configurable elements to json config file
#28
avkoehl
closed
3 years ago
0
convert all instances of ids,docs or filenames, docs to a pandas series
#27
avkoehl
closed
3 years ago
0
parallelize the preprocessing and tokenizing in corpus class
#26
avkoehl
closed
3 years ago
0
create corpus class to store corpus in its various states, and house all the subsetting/filtering methods
#25
avkoehl
closed
3 years ago
0
add code for running and saving word2vec model
#24
avkoehl
closed
3 years ago
0
add code for splitting text into sentences
#23
avkoehl
closed
3 years ago
0
refactor nlp module: split into nlp and parse_lda (and eventually parse_embeddings)
#22
avkoehl
closed
3 years ago
0
write method in Embedding class to load a trained gensim word2vec model
#21
avkoehl
closed
3 years ago
0
write function in nlp module to compute mean (of nonzero values) topic proportion for each metadata group
#20
avkoehl
closed
3 years ago
1
write function in nlp module to compute top documents for a given topic
#19
avkoehl
closed
3 years ago
0
Save all necessary data when running topic model
#18
avkoehl
closed
4 years ago
2
Next