Compute "similar" companies and discover "unique" company characteristics

elainespak commented 3 years ago

There are a bunch of document/review/sentence embedding techniques: https://towardsdatascience.com/document-embedding-techniques-fed3e7a6a25d
1. Method: Average & Cosine Similarity
  - For a given aspect and industry sector (GICS), 1) get company-specific sentence embeddings z_s from ABAE step, and 2) average them to obtain a "company embedding"
  - Calculate cosine similarities among all company embeddings
  - To understand which words contributed the most to the makeup of each company embedding, 1) make vocabulary distribution with attention weights, 2) make vocabulary distribution with frequency, 3) normalize and add them up (Need to validate)

elainespak commented 3 years ago

Method: Average & Cosine Similarity & tf-idf
- Similar to the above methods except for the last step. Apply tf-idf to extract words that characterize the given company

elainespak commented 3 years ago

elainespak commented 3 years ago

Clustered companies according to GICS.

elainespak / glassdoor_aspect_based_sentiment_analysis