dichen001 / Paper-Reading

0 stars 0 forks source link

Cooperation on the Literation Review. #1

Closed dichen001 closed 8 years ago

dichen001 commented 8 years ago

Hi @imaginationsuper,

I just summarized all the paper I read in the format of table and a document.

Two things for the moment,

  1. Let's finish the reading for the papers I mentioned in the document. You can choose the one that I haven't read yet and supplement our Table and Doc.
  2. Supplement related papers in Doc. Please modify the name to a similar format.[year]-[citation]-[Name].

BTW, we could find related papers mentioned in these paper, especially when they compare their work with the other.

Right now, I have read the first 5, and going to read 6.

jerry-shijieli commented 8 years ago

Great! Then I will read the 7th paper. And also I suggest we assign the 4th paper(2004----85-----Learning-to-Extract-Signature-and-Reply-Lines-from-Email) as one of the three seed papers, since it is cited a lot. And we can trace and read the 16 references on that paper.

jerry-shijieli commented 8 years ago

Summary Table for Literature Review

Year Cite Name Data Method Results Other
2008 295 An analysis of active learning strategies for sequence labeling tasks CoNLL-03 (Sang and DeMeulder, 2003) is a collection of newswire articles. NLPBA (Kim et al., 2004) is a large collection of biomedical abstracts annotated with five entities of interest. BioCreative (Yeh et al., 2005) and FlySlip (Vlachos, 2007). CORA Sequence Labeling and CRFs; Active Learning with Sequence Models; The large-scale empirical evaluation demonstrates that some of these newly proposed methods advance the state of the art in active learning with sequence models. These methods include information density (which we recommend in practice), sequence vote entropy, and sometimes Fisher information
2006 205 Visualizing email content: portraying relationships from conversational histories Participants’ email archives ranged in size from 90 MB to more than one GB, with the average size being 456 MB. The time span of these archives ranged from less than one year to over nine years of email activity. monthly and yearly words; adjusting the Time Scale; calculating the topic word(TFIDF); Two modes of personalized email visualization: exploration of “big picture” trends and themes (“haystack”) and more detail-oriented exploration (“needle”).
2012 1 Interpreting Contact Details out of E-mail Signature Blocks The service is available for user Gmail and Google Apps IMAP servers. Only French and English languages are fully covered but any ISO-latin signature can be analyzed Context; Extraction of HTML Part from MIME format; Elimination of specific configurations; Language detection; Formatting Details in vCard; Standardizing Phone Numbers; Update the address books. Millions of emails were analyzed by the servers, specific rules were adopted: non-isolatin encoding or above-200kO emails, for instance, are not analyzed for the sake of robustness.
2007 68 Author profiling for English emails Emails in several varieties of English, including native and non-native speakers of English coming from different geographical areas. Analysis: document parsing , text processing and linguistic analysis. Classification using WEKA toolkit of several algorithms: decision trees (J48 (Quinlan, 1993), RandomForest(Breiman, 2001)), lazy learners (IBk(Aha et al., 1991)), rule-based learners (JRip(Cohen, 1995)), Support Vector Machines (SMO(Keerthi et al., 2001), LibSVM (Chang and Lin,2001)), as well as ensemble/meta-learners (Bagging(Breiman, 1996), AdaBoostM1 (Freund andSchapire, 1996)). Results show chosen approach works well for author profiling and that using different classifiers in combination with a subset of available features can be beneficial for predicting single traits.
2005 144 Extracting personal names from email: applying named entity recognition to informal text CSpace email corpus (Kraut etal., 2004); Enron corpus (Klimt and Yang, 2004). NER(named entity recognizer) and CRFs (conditional random fields) Entity-level/F1 Mgmt-Teams (+3.9% / 91.3) Mgmt-Game (+3.8% / 95.4) Enron-Meetings (+1.2% / 77.9) Enron-Random (+0.7% / 76.7)