Closed djokester closed 3 years ago
@djokester interested, could you help me with it ?
@Sai-Adarsh sure. Go to the data set given in the link above. Extract the zipped folder. Inside folder you would find a folder titled lematisation_dataset which has a lot of text files which contain the word, its Part of Speech and its lemma. Extract the word and the lemma pairs for words only ( no numbers or symbols) For Nouns (tagged NNP or NN or any other noun tag) which have EXCLUDED given under lemma, store the original word as the lemma. Store the results from all the files in a single csv file and share it with me.
Extraction of Word, Lemma pairs from the BenLem dataset. Citation: A. Chakrabarty and U. Garain (2015): BenLem (a Bengali Lemmatizer) and its Role in WSD, in ACM Trans. Asian and Low-Resource Language Information Processing (TALIIP).