chaoss / augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/ and learn more about Augur at our website https://augurlabs.io
https://oss-augur.readthedocs.io/en/main/
MIT License
585 stars 843 forks source link

GSoC & Outreachy Machine Learning Microtasks #558

Closed sgoggins closed 4 years ago

sgoggins commented 4 years ago

Here's some ideas:

  1. Microtask 0: Familiarize yourself with augur by downloading and configuring the dev branch. For a little more context about what we are trying to accomplish with Augur's prototyping, checkout Augur's documentation at : https://oss-augur.readthedocs.io/en/dev/ Fork Augur into your own GitHub Account. Work in the dev branch.
  2. Microtask 1: When you install Augur, choose to load sample data. Using any machine learning algorithm or library you are comfortable experimenting with, identify data anomalies in the commits, pull_requests, issues, or "messages" tables. The database structure is in the documentation noted above. You would choose "messages" if you have a particular interest in computational linguistics oriented machine learning. There is one working example of anomaly detection available that is focused on the quantitative measurement of issues, pull_requests, or commits in the "insight worker", which is here in the code base we have: https://github.com/chaoss/augur/tree/dev/workers/insight_worker (There is an "insight_worker" folder inside this folder where the source code exists). You do not need to build your analysis as a worker.
  3. Microtask 2: Make a pull request to update the dev branch with the work from Microtask 1. If you have questions about getting started in the steps above, open an issue in the Augur repository.
manangoel99 commented 4 years ago

Hi, I'm Manan Goel an undergraduate at IIIT Hyderabad. I'm also pursuing research at Centre for Computational Natural Science and Bioinformatics where I've been working on using NLP and graph-based approached in Machine Learning to solve problems in biochemistry. In order to do that I've used PyTorch, scikit-learn and Keras. In order to work on the given microtasks, do I just need to fork the Augur repository and get started or are there any prerequisites that I need to complete first. Looking forward to hearing from you.

gabe-heim commented 4 years ago

@manangoel99 No prerequisites, you can go ahead and fork the repo and start working on microtasks. Then create a PR of the changes you've made and we will check it out!

aksh555 commented 4 years ago

@gabe-heim, I went through the insight worker you had mentioned earlier. I wanted to get an idea of the "push notification architecture" mentioned in the project idea. Is it like a notification system for the users, in the event of anomalous activities in the repos...

dnabanita7 commented 4 years ago

Hi! @gabe-heim Can I get started working on the micro-task? I am also interested in this project.

germonprez commented 4 years ago

@Naba7 yes, please do get started on the microtasks

akshitt commented 4 years ago

Hey @sgoggins I want to work on the project ML for Anomaly detection as an Outreachy Intern. Should I directly start doing the microtasks? Thank You!

sgoggins commented 4 years ago

@akshitt : YES!! Exactly the right step!

Pogayo commented 4 years ago

Hello,

I would like to work on anomaly detection as an Outreachy intern. Is it already assigned to someone or can I work on it?

Soniyanayak51 commented 4 years ago

Hi, I am an Outreachy 2020 applicant and would like to work on this.

germonprez commented 4 years ago

Hi everyone,

As stated at the top of this Issue. There are micro-tasks that you can do to express your interest in working on this project. If you are a GSoC applicant, post your results here: https://github.com/chaoss/governance/blob/master/GSoC-interest.md

If you are an Outreachy applicant, post your results here: https://github.com/chaoss/governance/blob/master/Outreachy-interest.md

rishilss99 commented 4 years ago

Hey @germonprez, the Outreachy application deadline is April 7, 2020 at 4pm UTC as mentioned in the applicant guide. (You can check it out here: https://www.outreachy.org/outreachy-may-2020-internship-round/communities/chaoss/#machine-learning-for-anomaly-detection-in-open-sou)

The Outreachy-interest.md file mentioned in your previous comment states the deadline as March 31, 2020 13:00 CDT. Kindly clarify on which deadline should be followed. Thank you!

germonprez commented 4 years ago

Thanks @rishilss99 I've made the change.

ishagarg06 commented 4 years ago

Hi, I am an Outreachy 2020 applicant and I'm very much interested in this project. I've been trying to install Augur but facing difficulty in doing so. Any help?

rishilss99 commented 4 years ago

Hi, I am an Outreachy 2020 applicant and I'm very much interested in this project. I've been trying to install Augur but facing difficulty in doing so. Any help?

Hey @ishagarg06 , I recently setup Augur on my system. In case you could mention any specific issues you are running into, I would be glad to help.

ccarterlandis commented 4 years ago

@ishagarg06 @rishilss99 I am also happy to help if you are experiencing any issues! Please feel free to open an issue on this repo if you are having trouble getting things configured.

ishagarg06 commented 4 years ago

@ccarterlandis @rishilss99 I use Windows10, and the commands for installation are of Ubuntu system. So what should I do? I'm not able to access (Sudo, apt, make) these commands.

rishilss99 commented 4 years ago

@ishagarg06 Refer the Getting started guidelines (https://github.com/chaoss/augur#getting-started). I would recommend setting up a Ubuntu VM rather than going for the Windows installation. You can go with VMware virtual machine if setting up a Ubuntu VM.

Soniyanayak51 commented 4 years ago

Hi all, I am having difficulty with choosing the options for database setup during the install. What port and host should be entered if I choose an existing database?

rishilss99 commented 4 years ago

@Soniyanayak51 Unless you have made some changes Port: 5432 and Host: localhost should work.

Soniyanayak51 commented 4 years ago

Thanks @rishilss99 !

chiral-carbon commented 4 years ago

Hi, I am very interested in this project and wish to apply for it for both Outreachy and GSoC. I am a final year B. Tech. (computer science) student from India. Should I go ahead and get started with the micro-tasks? I realize I am late but I am confident I can catch up and be able to complete the prerequisites and come up with a proposal in time. Thanks! :smile: @sgoggins @germonprez

germonprez commented 4 years ago

Thanks for your interest @chiral-carbon Yes, go ahead and get started with the microtasks. When you are done, don't forget to post your final results to the Interest Pages. It is simply a table that you need to add your information to. However, it is very important that you (and everyone) add this information - if you don't it's really impossible for us to see your work and, hence, consider your application.

yamini27 commented 4 years ago

Hi everyone, I'm Yamini Sharma from B.tech CSE (1st year) at IIT Mandi,India and I've applied for Outreachy 2020. I'm really interested in this great ML Anomaly Detection project from CHAOSS. I know that I'm a bit late to start but the ML project to which I was contributing earlier got closed so,I start now. But I'll work really hard for this great opportunity. I'll start working on Microtasks as soon as possible. Everybody here is very helpful. Thanks a lot for this wonderful project @sgoggins @germonprez

Pogayo commented 4 years ago

Hello @gabe-heim @germonprez after our interest pull requests have been merged, how do we get feedback on our work so that we can improve on it or do other tasks?

germonprez commented 4 years ago

Thanks @Pogayo Once you've done this, we have to simply wait for all of the interested people to submit their work (as you have done -- thanks!)

Aparna-Sakshi commented 4 years ago

Hi, I am an Outreachy 2020 applicant and I'm very much interested in this project. I have tried installing postgresql by following this link https://oss-augur.readthedocs.io/en/dev/getting-started/installation.html ,when I ran psql command, it gave following error psql: FATAL: role "aparna" does not exist, I tried few more links, it is mentioned below: https://www.liquidweb.com/kb/what-is-the-default-password-for-postgresql/ http://postgresguide.com/utilities/psql.html when I try: su - postgres from the first link, I get the following error: su: Authentication failure, (I have tried out many passwords, googled for default passwords) I would be really grateful if someone could help me with this issue.

aksh555 commented 4 years ago

Hi, I am an Outreachy 2020 applicant and I'm very much interested in this project. I have tried installing postgresql by following this link https://oss-augur.readthedocs.io/en/dev/getting-started/installation.html ,when I ran psql command, it gave following error psql: FATAL: role "aparna" does not exist, I tried few more links, it is mentioned below: https://www.liquidweb.com/kb/what-is-the-default-password-for-postgresql/ http://postgresguide.com/utilities/psql.html when I try: su - postgres from the first link, I get the following error: su: Authentication failure, (I have tried out many passwords, googled for default passwords) I would be really grateful if someone could help me with this issue.

If you have sudo permissions, you could try sudo -u postgres psql and then enter your usual sudo password, then you can create new users in postgres.

Aparna-Sakshi commented 4 years ago

Thanks a lot for your help @aksh555 , it worked!!

manangoel99 commented 4 years ago
Traceback (most recent call last):
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    cursor, statement, parameters, context
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedTable: relation "worker_history" does not exist
LINE 2: ...SELECT max(history_id) AS history_id, status FROM worker_his...
                                                             ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/manan/.virtualenvs/augur/bin/augur", line 11, in <module>
    load_entry_point('augur==0.11.0', 'console_scripts', 'augur')()
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/augur-0.11.0-py3.7.egg/augur/cli/run.py", line 119, in cli
    dbname=app.read_config('Database', 'name')
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/augur-0.11.0-py3.7.egg/augur/housekeeper/housekeeper.py", line 47, in __init__
    self.__updatable = self.prep_jobs(jobs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/augur-0.11.0-py3.7.egg/augur/housekeeper/housekeeper.py", line 313, in prep_jobs
    history_df = pd.read_sql(jobHistorySQL, self.helper_db, params={})
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/pandas/io/sql.py", line 438, in read_sql
    chunksize=chunksize,
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/pandas/io/sql.py", line 1218, in read_query
    result = self.execute(*args)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/pandas/io/sql.py", line 1087, in execute
    return self.connectable.execute(*args, **kwargs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2191, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 984, in execute
    return meth(self, multiparams, params)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1103, in _execute_clauseelement
    distilled_params,
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    cursor, statement, parameters, context
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "worker_history" does not exist
LINE 2: ...SELECT max(history_id) AS history_id, status FROM worker_his...
                                                             ^

[SQL: 
                        SELECT max(history_id) AS history_id, status FROM worker_history
                        GROUP BY status
                        LIMIT 1
                    ]
(Background on this error at: http://sqlalche.me/e/f405)

Has anyone faced this issue? I can't quite figure out how to instantiate these tables in the database. While installing I did add the sample data and the basic tables got added but none of the worker tables. I would really appreciate some help.

Pogayo commented 4 years ago
Traceback (most recent call last):
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    cursor, statement, parameters, context
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedTable: relation "worker_history" does not exist
LINE 2: ...SELECT max(history_id) AS history_id, status FROM worker_his...
                                                             ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/manan/.virtualenvs/augur/bin/augur", line 11, in <module>
    load_entry_point('augur==0.11.0', 'console_scripts', 'augur')()
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/augur-0.11.0-py3.7.egg/augur/cli/run.py", line 119, in cli
    dbname=app.read_config('Database', 'name')
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/augur-0.11.0-py3.7.egg/augur/housekeeper/housekeeper.py", line 47, in __init__
    self.__updatable = self.prep_jobs(jobs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/augur-0.11.0-py3.7.egg/augur/housekeeper/housekeeper.py", line 313, in prep_jobs
    history_df = pd.read_sql(jobHistorySQL, self.helper_db, params={})
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/pandas/io/sql.py", line 438, in read_sql
    chunksize=chunksize,
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/pandas/io/sql.py", line 1218, in read_query
    result = self.execute(*args)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/pandas/io/sql.py", line 1087, in execute
    return self.connectable.execute(*args, **kwargs)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2191, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 984, in execute
    return meth(self, multiparams, params)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1103, in _execute_clauseelement
    distilled_params,
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    cursor, statement, parameters, context
  File "/home/manan/.virtualenvs/augur/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "worker_history" does not exist
LINE 2: ...SELECT max(history_id) AS history_id, status FROM worker_his...
                                                             ^

[SQL: 
                        SELECT max(history_id) AS history_id, status FROM worker_history
                        GROUP BY status
                        LIMIT 1
                    ]
(Background on this error at: http://sqlalche.me/e/f405)

Has anyone faced this issue? I can't quite figure out how to instantiate these tables in the database. While installing I did add the sample data and the basic tables got added but none of the worker tables. I would really appreciate some help.

Hello @manangoel99 I don't know how to help with this issue exactly, but if you JUST need access to the sample data, please follow the steps below. It can help when the configuration/installation is having so many issues and you need to get started with contributions quickly.

  1. Go to this link and get the column names for whatever table you are analyzing. Since it is a large doc you can do CTRL+ F CREATE TABLE [Table Name]
  2. Go to this link and pick the table you want. Download the raw file and convert to CSV. You will be separating the different columns by space.
  3. Combine the column titles and the rows in your CSV file.
manangoel99 commented 4 years ago

@Pogayo Thanks for the help. I had made a mistake while loading the data.

ccarterlandis commented 4 years ago

Hello everyone! I have just finished setting up our Slack workspace for applicants, and am happy to invite anyone who is applying for Augur for GSoC 2020. If you would like to be added to the channel, please send me an at c@carterlandis.com with your name and the email address you like me to send the invite to. If you already got an invite from me, you can use it, or if you like I can send it to a different email address. 😊

tianyichow commented 4 years ago

Hello, everyone in Augur community. I am Tianyi Zhou(周添一), a postgraduate student at East China Normal University. I am also a applicant of GSoC 2020. My research interests are mainly in data science and software engineering. I have been working on the research of open source communities evaluation and sustainability for few months. And thanks for the timely and attentive help from @ccarterlandis. 😊

tab1tha commented 4 years ago

Greetings. My name is Tabitha and I am a data scientist in training. I am applying for the machine learning for anomaly detection project. while installing augur and running make install, I came across this

Would you like to use an existing directory, or create a new one? 1) Create a new directory 2) Use an existing directory

facade_repo_path Please what option should I have chosen? I chose 2 and I do not have a facade repo path to input. If I had to choose option 1 instead, is there a way to rerun it so I chose again?. When I try rerunning, it just continues from where the installation stopped. It does not restart. @germonprez @sgoggins

ccarterlandis commented 4 years ago

GSoC submissions are over and applicants have been selected. Thank you to everyone who submitted!