aimacode / aima-python

Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"
MIT License
7.97k stars 3.76k forks source link

Future Work on the NLP Apps Notebook #890

Open antmarakis opened 6 years ago

antmarakis commented 6 years ago

Recently in the nlp_apps notebook I added a section on the Federalist Papers. What I did was write a simple workflow from start to finish. There is a lot of work to be done still and I am opening this to community contributions, since I believe it is a great way to get started with the applications notebooks.

A few ways you can improve the section:


This is a big undertaking, and it doesn't need to happen on the particular problem. If you have a problem in mind, you can instead use the above ideas to tackle your own problems! Sentiment Analysis is trending right now, so maybe this is a place to explore some of the above.

All in all, I think this is a good project to chip in every once in a while and I hope it will serve as an introduction to the repository. Or maybe it will sound interesting to GSoC students who might choose to tackle this.

In any case, feel free to post here with ideas + if you want to start working on something.

Dimkoim commented 6 years ago

Hello @MrDupin! I have worked in NLP problems and I can give you some help here. :)

antmarakis commented 6 years ago

That's great! You can work on them either on GSoC or on your own. Whenever you want to get started, don't forget to post a comment here explaining in short what you want to do.

codingblazes commented 6 years ago

@MrDupin With reference to point 3,I would like to include support for both N-gram and Bag of Words models.I am mentioning the same in my proposal.

antmarakis commented 6 years ago

@ashwinnalwade Awesome! Good luck with your application!

prasadgujar commented 6 years ago

Hello, Reference to the 2nd point . We can do some preprocessing task like ranking words/sentence in paper with reference to frequency of word/sentence occurence and uniqueness of each word by considering ranking factor like the length of word/sentence.

dsaw commented 6 years ago

Hello, I would like to work on the first point on implementing log addition if no one is doing it yet. We can retain the decimal library Naive Bayes implementation and add logarithmic version with it. It might be good to compare both approaches in the notebook.

ad71 commented 6 years ago

Sure @dsaw you can get started, nobody is working on this yet.

reachtarunhere commented 6 years ago

@MrDupin I did some work related to language identification during my GSoC (the year before yours). This follows from an example in the book. If anyone is keen on taking this up - here is the notebook: https://github.com/reachtarunhere/aima-python/blob/lang-id/lang-id.ipynb

If not, I would be happy to clean it up and get it merged myself.

antmarakis commented 6 years ago

@reachtarunhere I only skimmed through it, but it looks good. Maybe the current GSoC students can take it up?

ad71 commented 6 years ago

@MrDupin I would have loved to take it up but I have no knowledge of NLP. I'll definitely come back to this once I get acquainted with the topic.

reachtarunhere commented 6 years ago

@ad71 or anyone else taking this up - feel free to get in touch if you need any help/clarifications. In case I miss this here (lots of notifs) you can always drop an email :)

llucifer97 commented 6 years ago

hi @MrDupin I have some experience in nlp would like to solve the issue if this issue is still open I would be working on case 2 , would like to use different methods like bag of model.

antmarakis commented 6 years ago

@llucifer97: You can definitely work on the issue! Simply fork the repository, make your changes to the notebook, and submit a Pull Request.

If you need any help, feel free to ask (although I am swamped with work at the moment so I am not sure I will be able to provide much help).

aditya-hari commented 6 years ago

Hello @MrDupin.

Would I be right in assuming that this isn't an issue that someone completely new to opensource and AI-related topics can handle? I am well versed with Python but have never dabbled with this sort of thing.

Apologies if this is a very trivial question.

antmarakis commented 6 years ago

@aditya-hari: Hmmm, I'm not sure... I don't think any past experience with open-source is necessary, but I believe some understanding of the AI concepts covered in this notebook is a must. If you want you can take it on, but it seems to me that it will be difficult to make much progress efficiently, mainly because not only do you have to execute the concepts, but also showcase them in a manner in which others can understand them.

It is up to you, but I first suggest you read up on NLP and some basic AI material in order to be better equipped to tackle this challenge.

aditya-hari commented 6 years ago

Is encouraging to know that past experience isn't necessary.

Could you kindly guide me to some resources which can help me get up to speed? Or point me to some other issues which would be easier to start off with?

Would really appreciate any help.

antmarakis commented 5 years ago

Hi @aditya-hari, sorry for taking so long to get back to you.

The way this repository works no longer involves issues (there are exceptions though). The main work remaining is to simply add to the notebooks.

For NLP, I am afraid there's not much I can say. Personally, I was just looking around the internet for information, I was never taught anything in a class. I suggest you snoop around and try to find a university course or something that is open to the public. Then you can follow its structure, complementing your studying with googling.

vaibhavshukla182 commented 5 years ago

Hi, @MrDupin I have never contributed to open source before this so pardon me if i say anything trivial...but can you please tell me that how could I contribute to your projects, I do have some NLP experience, have implemented a research paper to code on text analysis, so can you please tell me that what contribution I could do.

antmarakis commented 5 years ago

@vaibhavshukla182: Hi! Since most of the algorithms in this project have been implemented, I suggest you turn to providing examples for the notebooks. Maybe some text analysis, or a tutorial on NLP techniques, or stuff of this nature. Remember that this project is educational, so the more examples we have the better. Just don't forget to include some instructions on how your examples work.

Also, I believe you can implement algorithms outside the AIMA book as well. But only if they have short implementations and easy testing under pytest. If you have any questions, feel free to ask!

vaibhavshukla182 commented 5 years ago

Can I add some data preprocessing and data visualisation techniques? Or something like nlp for fuzzy string matching, or a comparison of various methods for text classification and which one is better in which situation.

antmarakis commented 5 years ago

@vaibhavshukla182: This sounds great! For the nlp stuff though, don't use any external libraries. We want to keep all the code right in front of the student, with no middle-man. Feel free to use visualization libraries (preferably matplotlib) for the viz stuff.

rishabhdash commented 5 years ago

@MrDupin I too want to work on the visualisation technique. I was thinking of implementing t-SNE for dimensionality reduction. I know this will not bring in great results for a big dataset. Also i want to add tf-idf and word2vec in addition to BOW and n grams. I can also work on with different algos on the same dataset like KNN and time split(temporal) data KNN and so on.

rishabhdash commented 5 years ago

Also i think i might have to add some additional dataset on which i can do some super wised classification and do a polarity based t-sne visualisation as well as for KNN. Please see this for dataset recommendation: https://github.com/aimacode/aima-data/issues/10#issuecomment-436382620

antmarakis commented 5 years ago

@cursed4ever I think for now TF.IDF would be a nice addition, if you want to get started with it.

rishabhdash commented 5 years ago

@MrDupin I am working on a toy dataset containing amazon customer reviews and implementing BOW, N-grams and tf-idf and providing visualisation for the same.

antmarakis commented 5 years ago

@cursed4ever This sounds good. Just make sure we actually have the right to publish and use the dataset you are working with, using proper attribution.

rishabhdash commented 5 years ago

I am currently using a dataset published on kaggle so i believe it is opensourced and we can use it. Correct me if i am wrong.

antmarakis commented 5 years ago

I am not familiar with Kaggle's licensing terms, so I can't say for sure. Usually they will share their license in the dataset page, but you may need to ask them if you have permission to redistribute their data.

rishabhdash commented 5 years ago

I asked the organisation, they said that their dataset is free to use for academic purposes. They only require to site their website in the publication.

antmarakis commented 5 years ago

OK, then it's fine if you use it here, thanks!

Insiyaa commented 5 years ago

Hey, I hope this issue is still open. I'd like to add some pre-processing tasks + analysis of the text.

ad71 commented 5 years ago

@Insiyaa sure, go ahead.

dtlight commented 5 years ago

@MrDupin I'm unsure as to where @ashwinnalwade got to on point 3 but I'm happy to start this or continue from where he left off?

dtlight commented 5 years ago

If not are there any other issues you'd suggest I get stuck into? I'm comfortable with python and have experience with AI (incl. NLP). Thanks.

ad71 commented 5 years ago

@dave-light There hasn't been much progress on this issue. Feel free to work on whatever aspect you feel like and submit a pull request. @MrDupin can guide you further for NLP-specific queries.

ashishgit7 commented 5 years ago

@MrDupin sir I want to add genism and lemmatization in it

Dimkoim commented 5 years ago

@hackerashish25 Although I am not @MrDupin if you scroll up, you can find in one of his replies the following:

@vaibhavshukla182: This sounds great! For the nlp stuff though, don't use any external libraries. We want to keep all the code right in front of the student, with no middle-man. Feel free to use visualization libraries (preferably matplotlib) for the viz stuff.

thesagarsehgal commented 5 years ago

@MrDupin for the preprocessing of text we must remove numerics from the text since we are only dealing with textual data. So, should I go ahead and add this part to pre-preprocessing of data. Further, I couldn't find any implementation TF-IDF as mentioned by others above in the conversation. Is anyone still working on it? If not, can I start working on it?

antmarakis commented 5 years ago

@sagar-sehgal: I am not sure why we must remove numerics from the text. Is there a reason for that? If so, it would be great to do that.

Also, adding the tf-idf algorithm would be a great idea as well, even though we don't explicitly have the pseudocode in the book.

ashishgit7 commented 5 years ago

@MrDupin I have made 1 PR for text-processing . Now I would like to add tf-idf for text processing but can I use external library to do so

antmarakis commented 5 years ago

@hackerashish25 What external library are you planning on using? Quite a while ago I wrote an implementation for tf-idf, and it did not require any libraries, if I remember correctly.

Ask149 commented 5 years ago

Also, I believe you can implement algorithms outside the AIMA book as well. But only if they have short implementations and easy testing under pytest. If you have any questions, feel free to ask!

Hello @MrDupin, I want to contribute to aima-python. It seems most of the algorithms have already been implemented. Can I propose and work on some AI optimization algorithms? ( If adding new algorithms outside AIMA book is valid) I was suggesting to implement algorithms like Stochastic Gradient Descent, Particle Swarm Optimization, etc. I would also like to work on other notebook completions if the above algorithms implementation is not necessary for the project. Thank You!

antmarakis commented 5 years ago

Hello @MrDupin, I want to contribute to aima-python. It seems most of the algorithms have already been implemented. Can I propose and work on some AI optimization algorithms? ( If adding new algorithms outside AIMA book is valid) I was suggesting to implement algorithms like Stochastic Gradient Descent, Particle Swarm Optimization, etc. I would also like to work on other notebook completions if the above algorithms implementation is not necessary for the project. Thank You!

@Ask149: I am not familiar with Particle Swarm, but SGD is something that can be implemented here, sure!

JayantSravan commented 5 years ago

Hey @MrDupin, I want to contribute to aima-python. I am interested and experienced in the fields of NLP, genetic algorithms, particle swarm optimizations, reinforcement learning and information retrieval. Any pointers you could give me on what project should I take up? I want to do it as my GSoC project.

ShaswatLenka commented 5 years ago

Hello @MrDupin , Can I take on "sentiment analysis" for the apps notebook if no one is working on it? Also I would like to implement customisable vectorisers as utility functions that can come handy in many situations.

aasthasood commented 5 years ago

Hello @MrDupin, I am interested in starting with sentiment analysis and would like to contribute to the NLP Apps notebook. Any pointers you could give me on project selection, as I want to do this for my GSoC project.

antmarakis commented 5 years ago

Hello all (@JayantSravan, @ShaswatLenka, @aasthasood). Thank you for your interest in this project.

For sentiment analysis, you can pick a project and work on it. A popular one is the movie reviews dataset. I would prefer if you picked something else (you can research "sentiment analysis datasets" on Google). Pick a relatively small dataset and work on it.

Remember though that you cannot use any external libraries for the training, you have to do it yourselves to showcase the algorithm.

0xrushi commented 5 years ago

Hi @MrDupin, can you repost what else is left to be done i see log,visualization,preprocessing is already completed, can i go with sentiment analysis ?

antmarakis commented 5 years ago

@rushic24: Sure, you can do solve some sentiment analysis problems. Sounds good!

Kaustav97 commented 5 years ago

Hi @MrDupin , has anyone taken up the sentiment analysis example? If not, I can contribute since I have prior work on IMDB dataset, which I can probably re-use. Also, I can take up the explanation of the 'Question Answering' section in nlp.ipynb? Thanks.