Open Technocolabs100 opened 3 years ago
Would like to work on this @Technocolabs100. Please assign it to me if possible. 😊👍🏻
Can you assign me this issue . I'm a GSSOC21 participant.
I need to check your previous one then I'll be go with this new issue.
Thanks for assigning this, Will try to get this done asap. Need to read and figure out certain parts of data for preprocessing. 😄👍🏻
You have to follow the below-mentioned steps to process further : i. Sampled 1M data points because of computing and memory limitations. ii. Separated code-snippets from Body iii. Removed Special characters from Question title and description (not in code) iv. Removed stop words (Except ‘C’) v. Removed HTML Tags using Regular Expressions vi. Converted all the characters into small letters vii. Used SnowballStemmer to stem the words Below we can find the example questions after preprocessed.
And now you have to create a new database called ‘Processed.db’ and loaded the preprocessed data into it.