datalab-dev quintessence_analysis issues

datalab-dev / quintessence_analysis

All the scripts we use for analysis

0 stars 0 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

add topics.termstopicsdist collection

#67 avkoehl closed 3 years ago
1
new data model for topics over time

#66 avkoehl closed 3 years ago
2
add relevance metric from ldavis

#65 avkoehl closed 3 years ago
5
update nearest neighbors to 30 in terms.neighbors

#64 avkoehl closed 3 years ago
1
rework parse_lda to not need corpus object

#63 avkoehl closed 3 years ago
0
add verbose argument to Parallel calls

#62 avkoehl opened 3 years ago
0
test out modin library

#61 avkoehl opened 3 years ago
0
normalize Id usage accross the tables

#60 avkoehl closed 3 years ago
0
add terms.kwic collection

#59 avkoehl opened 3 years ago
0
add terms.frequencies collection

#58 avkoehl closed 3 years ago
0
Add wordcounts and decades column to metadata collection

#57 avkoehl closed 3 years ago
1
add topTerms array to topics collection

#56 avkoehl closed 3 years ago
0
restrict vocab

#55 avkoehl closed 3 years ago
1
order docs based on word count before preprocessing

#54 avkoehl opened 3 years ago
0
debug bow creation to use multiple cores

#53 avkoehl opened 3 years ago
1
In compute subsets of embeddings use groups instead of indices

#52 avkoehl closed 3 years ago
2
move text data out of the dataframes

#51 avkoehl closed 3 years ago
0
save all dataframes as parquet files instead of csv

#50 avkoehl opened 3 years ago
0
simplify outputs of topic model to pandas dataframes - doctopics, topicterms, termdocmatrix

#49 avkoehl closed 3 years ago
0
simplify inputs to embeddings class and topic model class to just be a single dataframe

#48 avkoehl closed 3 years ago
0
move embedding alignment out of nlp module into its own module

#47 avkoehl closed 3 years ago
0
mv preprocessing for models directly into topicmodel and embeddings

#46 avkoehl closed 3 years ago
0
add manually selected stop words for embeddings and topic model

#45 avkoehl opened 3 years ago
0
mv parse_lda to class

#44 avkoehl closed 3 years ago
0
move n_workers to global parameter (gets used in preprocessing and models)

#43 avkoehl closed 3 years ago
0
run on full 60K documents corpus to find performance pain points

#42 avkoehl closed 3 years ago
3
mv parse_embed to a class

#41 avkoehl closed 3 years ago
1
Fix this

#40 avkoehl closed 3 years ago
3
rework subsets to not include the decades. instead have subsets and decades

#39 avkoehl closed 3 years ago
0
add code for creating all embeddings collections

#38 avkoehl closed 3 years ago
0
add arg to tm_mallet training for silencing output

#37 avkoehl closed 3 years ago
1
add method for creating output file name based on subset. Make sure spaces are replaced with'_'

#36 avkoehl closed 3 years ago
0
move model parameters out of init and into the train function.

#35 avkoehl closed 3 years ago
0
fix embeddings path handling to properly handle relative paths

#34 avkoehl closed 3 years ago
0
write pipeline function for embedding

#33 avkoehl closed 3 years ago
0
add code for loading in a set of embeddings models from files

#32 avkoehl closed 3 years ago
0
write code for running all the embeddings subsets and saving to appropriate output

#31 avkoehl closed 3 years ago
0
write pipeline function for topic model

#30 avkoehl closed 3 years ago
0
fix topic model path handling to properly convert relative odir path to absolute odir path

#29 avkoehl closed 3 years ago
0
move all configurable elements to json config file

#28 avkoehl closed 3 years ago
0
convert all instances of ids,docs or filenames, docs to a pandas series

#27 avkoehl closed 3 years ago
0
parallelize the preprocessing and tokenizing in corpus class

#26 avkoehl closed 3 years ago
0
create corpus class to store corpus in its various states, and house all the subsetting/filtering methods

#25 avkoehl closed 3 years ago
0
add code for running and saving word2vec model

#24 avkoehl closed 3 years ago
0
add code for splitting text into sentences

#23 avkoehl closed 3 years ago
0
refactor nlp module: split into nlp and parse_lda (and eventually parse_embeddings)

#22 avkoehl closed 3 years ago
0
write method in Embedding class to load a trained gensim word2vec model

#21 avkoehl closed 3 years ago
0
write function in nlp module to compute mean (of nonzero values) topic proportion for each metadata group

#20 avkoehl closed 3 years ago
1
write function in nlp module to compute top documents for a given topic

#19 avkoehl closed 3 years ago
0
Save all necessary data when running topic model

#18 avkoehl closed 4 years ago
2