chaoss / grimoirelab

GrimoireLab: platform for software development analytics and insights
https://chaoss.github.io/grimoirelab/
GNU General Public License v3.0
487 stars 181 forks source link

GSoC Idea: Build an expert system to provide recommendations to users in a user interface #414

Closed vchrombie closed 3 years ago

vchrombie commented 3 years ago

The new version of SortingHat provides a basic recommender system. It tells information about what identities could be the same, or what identities work for which companies. This information might not be useful for the end-user and it isn't available on the UI, though.

SortingHat is the tool that we use to manage identities data in GrimoireLab. As individuals in a project can have different identities - several usernames or email addresses - this tool allows creating unified profiles of them. Then, the platform will use this information to generate accurate results of the activity of these participants.

SortingHat started as a command-line tool but after some years, we saw its potential and we decided to create a new version, this time as a service. This new version provides a new GraphQL API to operate with the server and a UI web-based app, that replaces Hatstall, the old UI for SortingHat.

Although the development of it is in its later stage and it will be ready soon for the stable version of the platform, there are many good ideas that we will like to incorporate. Some of them were selected for GSoC 2021.

The aims of the project are as follows:

The aims will require generating code in Python for Django and the GraphQL API, and for the web app (generated with Vue.js and Vuetify).

Microtasks

For becoming familiar with GrimoireLab, you can start by reading some documentation. You can find useful information at:

Once you're familiar with Grimoirelab, you can have a look at the following microtasks.

stevenkolawole commented 3 years ago

Hello!

I'm Steven Kolawole, a Computer Science Junior from FUNAAB, Nigeria. GrimoireLab has piqued my interest right since when I joined the CHAOSS community back in February. Following the GrimoireLab's tutorial, I'm fairly familiar with how it works.

I have a strong background in Machine Learning and I've done two prior internships in Data Science and Machine Learning Engineering roles. My ML background means that I'm familiar with recommender systems, having built one myself (albeit on a small scale and it didn't go into production). I'm also a decent Django developer as I've also interned in a Backend Engineering role before. I'm familiar with JS basics and I'm happy to strengthen my knowledge of it, likewise all the other necessary requirements that I don't know for now.

It's really exciting to be here. I plan to learn a lot and hopefully, have lots of fun while at this. Thank you for having me. 😄

Venkatavaradan-R commented 3 years ago

Hi There!

My name is Venkat and I'm a CSE Junior studying in Bangalore, India. I've been interested in GrimoireLab ever since I joined the community back in November 2020.

I've got a good amount of experience with building and integrating/hosting ML models (especially NLP and recommender systems) having done 2 internships, one as a Data Science Intern and one as a Backend Developer (Funnily having to implement a recommender system for a recipe app :laughing: ). I have also worked with JavaScript-based frameworks like React JS. I was just wondering whether this is on a first-come-first-serve basis?

I'm super glad to see this project and hoping to contribute as much as I can in the near future! :smiley:

darmis007 commented 3 years ago

Hello!

I am Darsh Mishra a sophomore studying at BITS Pilani. I am an experienced Django Developer having written bug-free code for websites serving payments, carrying out registrations, and managing a functioning of a large-scale college fest. I also have experience of developing ML Models especially recommendation systems which I learned during an internship and have been effectively been used in an online trade fair. I have also worked with JS frameworks like ReactJS so catching up with Vue JS would not be a problem for me. I would love to contribute to this project

galexad commented 3 years ago

Hi! I am Gabriela Alexandru, a 1st year Computer Science student at Vrije Universiteit(VU) Amsterdam. I have some experience with Python and ML, having done an internship in Natural Language Processing where I worked on a conversational AI system. I have also some Javascript(node.js & jquery) skills from recently completing a course in Web technology as a part of my degree.

I'm comfortable with using Linux/bash terminal and commands, as well as github, and I could describe myself as a fast learner, ambitious and self-motivated student.

I would like to participate in this project in order to learn more about how to build a recommender system, especially due to my interest in NLP, but also because I find this a challenging project that would bring me a great learning experience. As a newbie, I am very excited to get to know more about how to contribute to open source and become part of a community whom I could help and who could help me grow.

Find me on linkedin :)

vchrombie commented 3 years ago

Hi everyone, thanks for your interest in applying for this idea. You can start working on the microtasks to get a better idea of the project. Let us know if you have any doubts. :slightly_smiling_face:

I was just wondering whether this is on a first-come-first-serve basis?

@Venkatavaradan-R, this is not on a first-come-first-serve basis. You have to submit a proposal as per the GSoC Guidelines and also attempt at least one microtask.

vchrombie commented 3 years ago

For all students interested in this idea, please comment on this issue to get in touch with the mentors. This is the main communication channel.

rohanreddych commented 3 years ago

Hello @vchrombie I'm trying to understand the problem statement.

The new version of SortingHat includes a basic recommender system. It guesses about what identities could be the same, or what identities work for which companies. This information might not be useful for the end user and it isn't available on the UI, thought. This project idea is about improving the recommender system to an expert system that provides useful recommendations to users

The goal of the project is to show "recommendations of profiles which might be same" to the user through the UI. We have to create the recommendation system using machine learning / AI.

Am i correct?

sduenas commented 3 years ago

Hello @vchrombie I'm trying to understand the problem statement.

The new version of SortingHat includes a basic recommender system. It guesses about what identities could be the same, or what identities work for which companies. This information might not be useful for the end user and it isn't available on the UI, thought. This project idea is about improving the recommender system to an expert system that provides useful recommendations to users

The goal of the project is to show "recommendations of profiles which might be same" to the user through the UI. We have to create the recommendation system using machine learning / AI.

Am i correct?

No. No machine learning/AI. It's very complicated for a project of one month. It's more important to use the methods that we already know, to improve them and to find a way to visualize that data with the current UI.

rohanreddych commented 3 years ago

Hello, @sduenas I was trying the new sortinghat server and ui can you give some data to test the graphql console. and sortinghat_db.auth_user

vchrombie commented 3 years ago

Hi @rohanreddych, I understand you are asking about the microtask-6. Let me know if I'm are wrong.

Hello, @sduenas I was trying the new sortinghat server and ui can you give some data to test the graphql console.

You can use the data that was stored in the database after the microtask-5.

and sortinghat_db.auth_user

You can create a admin user using createsuperuser which can be used to login. https://djangocentral.com/creating-super-user-in-django/

We will try to add some minimal documentation soon.

rohanreddych commented 3 years ago

You can create a admin user using createsuperuser which can be used to login.

Thanks, @vchrombie. I was able to enter data through UI and retrieve it through the graphql console.

The aims of the project are as follows:

  • Identify useful recommendations for the end-user.
  • Implement new recommendations.
  • Integrate recommendations on the UI.

@sduenas , @evamillan, @mafesan Can you please explain a little more about the project? I was looking at the recommendations/engine.py currently, affiliations and profile matching are implemented, and there are plans to implement gender recommendations (#471). What other recommendations are to be implemented?

WasimAkhtarKhan commented 3 years ago

Hello, I am Wasim Akhtar Khan studying Information Technology from BIT Durg , a GSOC aspirant and a passionate learner. I have been using Django since last year and would like to contribute in this project. I have seen the Microtasks and looking forward to work with you for a long time. Thank You @vchrombie

vchrombie commented 3 years ago

Hi, @SteveKola @Venkatavaradan-R @darmis007 @galexad @rohanreddych @WasimAkhtarKhan I hope you started working on the microtasks. As you might know, you have to submit a proposal before the GSoC deadline. You are also expected to attempt at least one microtask for considering your application.

The main reason behind the microtasks is, these tasks will give a good minimum understanding of the Sorting Hat tool as well as the GrimoireLab platform as a whole. It will be really helpful for writing your proposal.

If you haven't started working on the microtasks yet, I would suggest you start asap. You can create a github repository for storing the microtasks and you can open issues in that repo for asking doubts or reviewing the tasks.

Thanks.

WasimAkhtarKhan commented 3 years ago

Hi, @SteveKola @Venkatavaradan-R @darmis007 @galexad @rohanreddych @WasimAkhtarKhan I hope you started working on the microtasks. As you might know, you have to submit a proposal before the GSoC deadline. You are also expected to attempt at least one microtask for considering your application.

The main reason behind the microtasks is, these tasks will give a good minimum understanding of the Sorting Hat tool as well as the GrimoireLab platform as a whole. It will be really helpful for writing your proposal.

If you haven't started working on the microtasks yet, I would suggest you start asap. You can create a github repository for storing the microtasks and you can open issues in that repo for asking doubts or reviewing the tasks.

Thanks.

Hello @vchrombie , I have gone through tutorials of GrimoireLab platform and SortingHat, setup the dev environment, and have started working on microtasks. It is very much interesting. Thank You

WasimAkhtarKhan commented 3 years ago

Hello @vchrombie @sduenas , This is just to make sure that I am doing right: Step 1: Setup dev environment using Only Docker as given in chaoss/grimoirelab-sirmordred - Getting-Started.md. Step 2: Then Running grimoirelab/full Step 3:Then for executing micro-mordered I used this Tutorial In this Tutorial, It is given that I will get this Couldn't find any Elasticsearch data. You'll need to index some data into Elasticsearch before you can create an index pattern But I got the required dashboard without running this command $ python3 micro.py --raw --enrich --cfg setup.cfg --backends git The image I got is: Screenshot from 2021-03-23 16-59-19

Is there anything I am doing wrong then please correct me.

rohanreddych commented 3 years ago

@WasimAkhtarKhan

In this Tutorial, It is given that I will get this Couldn't find any Elasticsearch data. You'll need to index some data into Elasticsearch before you can create an index pattern But I got the required dashboard without running this command $ python3 micro.py --raw --enrich --cfg setup.cfg --backends git The image I got is:

That's because you are running the grimoirelab container.

Use docker-compose to run the docker-compose.yml file in https://chaoss.github.io/grimoirelab-tutorial/sirmordred/micro-mordred.html. You will get the expected result.

Correct me if I am mistaken. :smile:

vchrombie commented 3 years ago

Hi @WasimAkhtarKhan

Step 1: Setup dev environment using Only Docker as given in chaoss/grimoirelab-sirmordred - Getting-Started.md.

You have to use the source code and docker method for setting up the dev environment.

The image I got is:

The image looks good to me. It is the expected output.

WasimAkhtarKhan commented 3 years ago

Hi @WasimAkhtarKhan

Step 1: Setup dev environment using Only Docker as given in chaoss/grimoirelab-sirmordred - Getting-Started.md.

You have to use the source code and docker method for see tting up the dev environment.

  • Why not docker? You don't get to work with source code if you are using only docker method.

Ok I will set up with source code and docker method

The image looks good to me. It is the expected output.

Ok Nice. Thanks @vchrombie

WasimAkhtarKhan commented 3 years ago

Hello @vchrombie , @rohanreddych I set up the PyCharm and python interpreter. I am getting two problems

After executing python3 micro.py --raw --enrich --cfg ./setup.cfg --backends git cocom in terminal. I am getting this error Screenshot from 2021-03-24 11-00-23 and Screenshot from 2021-03-24 11-36-15

When I try to resolve Package requirement "grimoirelab-elk" is not satisfied by terminal and by Install Requirement I get this Screenshot from 2021-03-24 11-38-51

rohanreddych commented 3 years ago

First one:

Port 9200, so there is a problem with elasticsearch,I don't think you have elasticsearch running. Is there an elasticsearch instance running on your system?

To solve this, I recommend using docker-compose as I said in the previous comment.

Second one:

I got similar errors related to requirements. Can you create a new virtualenv and do pip install requirements.txt ? If that does not solve your errors then vchrombie can help you.

vchrombie commented 3 years ago

Hi @WasimAkhtarKhan

* First one:

After executing python3 micro.py --raw --enrich --cfg ./setup.cfg --backends git cocom in terminal. I am getting this error

Looks like the elasticsearch is not running at the required port. Can you confirm if there are no errors in the logs of docker-compose? You should have elasticsearch, kibiter, and MariaDB/MySQL running in the respective ports. docker-compose with searchguard

* Second is:

When I try to resolve Package requirement "grimoirelab-elk" is not satisfied by terminal and by Install Requirement

You have to set the Project Structure too, along with Project Interpreter. The Project Structure should have all the grimoirelab repositories. Setting up PyCharm

I got similar errors related to requirements. Can you create a new virtualenv and do pip install requirements.txt ? If that does not solve your errors then vchrombie can help you.

@rohanreddych, this works but it would be better you use the source code of elk instead of using the pip package.

The grimoirelab dependencies must be loaded using Project Interpreter, whereas the rest of the dependencies should be installed using Project Interpreter.

For example, I'm providing an excerpt from the SirMordred requirements.txt file.

colorlog==4.1.0
elasticsearch==6.3.1
elasticsearch-dsl==6.3.1
file-read-backwards==2.0.0
-e git+https://github.com/chaoss/grimoirelab-toolkit/#egg=grimoirelab-toolkit
-e git+https://github.com/chaoss/grimoirelab-sortinghat/#egg=grimoirelab-sortinghat
-e git+https://github.com/chaoss/grimoirelab-kidash/#egg=grimoirelab-kidash
...

In this case

If you are installing everything using pip install -r requirements.txt the pip packages are installed, the source code won't be used.

WasimAkhtarKhan commented 3 years ago

Hello @vchrombie

Looks like the elasticsearch is not running at the required port. Can you confirm if there are no errors in the logs of docker-compose? You should have elasticsearch, kibiter, and MariaDB/MySQL running in the respective ports. docker-compose with searchguard

When I execute sudo docker-compose up -d. It is saying Starting elasticsearch Output is: Screenshot from 2021-03-24 15-37-40 And after executing sudo docker-compose logs It is displaying some error message of Unable to revive connections and No living connections Screenshot from 2021-03-24 15-42-22

I tried to use docker-compose-without-searchguard.

You have to set the Project Structure too, along with Project Interpreter. The Project Structure should have all the grimoirelab repositories. Setting up PyCharm

Yes I have set the Project Structure along with Project Interpreter of all gremoirelab components as per Setting up PyCharm

rohanreddych commented 3 years ago

@WasimAkhtarKhan Elasticsearch has not started, thats why kibana is showing that warning. Once stop all the containers, run sysctl -w vm.max_map_count=262144 and then start again.

WasimAkhtarKhan commented 3 years ago

@WasimAkhtarKhan Elasticsearch has not started, thats why kibana is showing that warning. Once stop all the containers, run sysctl -w vm.max_map_count=262144 and then start again.

Thank you @rohanreddych Starting elasticsearch This output confused me. You were right. Now I got the Elasticsearch running on port 9200. Thanks @vchrombie

vchrombie commented 3 years ago

Thanks, @rohanreddych for helping.

@WasimAkhtarKhan

I tried to use docker-compose-without-searchguard.

Cool, but I think you might need to change the es endpoints in the setup.cfg file if you are using without searchguard.

[es_collection]
url = http://localhost:9200

[es_enrichment]
url = http://localhost:9200

The setup.cfg has all the configurations related to es, kibiter, sortinghat, etc. So, please make sure the configurations are correctly set.

WasimAkhtarKhan commented 3 years ago

The setup.cfg has all the configurations related to es, kibiter, sortinghat, etc. So, please make sure the configurations are correctly set.

Ok I'll. Thanks @vchrombie

WasimAkhtarKhan commented 3 years ago

Hi @vchrombie While executing Micro-Mordred --raw --enrich --cfg ./setup.cfg this configuration works fine. But after --panels this configuration it is giving an error Screenshot from 2021-03-25 19-20-21

vchrombie commented 3 years ago

Hi @WasimAkhtarKhan, can you confirm the full command which you are trying to execute?

The configuration should be something like Script path: path of the micro.py Parameters: --panels --cfg ./setup.cfg --backends git (since you need only panels) Working directory: path to sirmordred utils folder

We can have a quick chat incase if it is not solved.

WasimAkhtarKhan commented 3 years ago

Hi @WasimAkhtarKhan, can you confirm the full command which you are trying to execute?

The configuration should be something like Script path: path of the micro.py Parameters: --panels --cfg ./setup.cfg --backends git (since you need only panels) Working directory: path to sirmordred utils folder

We can have a quick chat incase if it is not solved.

Yes I used this only Screenshot from 2021-03-25 21-57-40

vchrombie commented 3 years ago

@WasimAkhtarKhan

https://github.com/chaoss/grimoirelab/issues/414#issuecomment-807050075

I see you have mentioned only --panels in the parameters. Please add the full configuration. It should be --panels --cfg ./setup.cfg --backends backend-name.

WasimAkhtarKhan commented 3 years ago

@WasimAkhtarKhan

#414 (comment)

I see you have mentioned only --panels in the parameters. Please add the full configuration. It should be --panels --cfg ./setup.cfg --backends backend-name.

Ok I tried from here. Should it be changed? Sorry and Thanks @vchrombie

rohanreddych commented 3 years ago

Hello @vchrombie I think we need to improve the error message here. https://github.com/rohanreddych/grimoirelab-sirmordred/blob/b9686e874b49270ceefae1e3286eed4dcffa4248/sirmordred/task_panels.py#L277

Should i start working on this?

vchrombie commented 3 years ago

https://github.com/chaoss/grimoirelab/issues/414#issuecomment-807064135

No problem @WasimAkhtarKhan, please let me know if that solved the issue.

Hello @vchrombie I think we need to improve the error message here. https://github.com/rohanreddych/grimoirelab-sirmordred/blob/b9686e874b49270ceefae1e3286eed4dcffa4248/sirmordred/task_panels.py#L277

Interesting. @rohanreddych, would you be interested to submit a PR for it?

WasimAkhtarKhan commented 3 years ago

No problem @WasimAkhtarKhan, please let me know if that solved the issue.

Screenshot from 2021-03-25 22-25-03 Yea It solved it. I tried many things to solve that issue myself. Your reply motivates me Thanks @vchrombie

rohanreddych commented 3 years ago
/home/rohan/2work/sources/venv/bin/python3.8 /home/rohan/2work/sources/grimoirelab-sirmordred/utils/micro.py --raw --enrich --cfg ./setup.cfg --backends git
Collection for git: starting...
  2021-03-26 21:06:34,845 Reading projects data from  ./projects.json 
  2021-03-26 21:06:34,845 [git] collection phase starts
  2021-03-26 21:06:34,845 [git] collection starts for https://github.com/chaoss/grimoirelab-perceval
  2021-03-26 21:06:34,883 [git] Incremental from: 2021-03-15 09:46:45+00:00 for https://github.com/chaoss/grimoirelab-perceval
  2021-03-26 21:06:34,883 Fetching latest commits: 'https://github.com/chaoss/grimoirelab-perceval' git repository
Collection for git: finished after 00:00:01 hours
  2021-03-26 21:06:36,656 Fetch process completed: 0 commits fetched
  2021-03-26 21:06:36,658 [git] Done collection for https://github.com/chaoss/grimoirelab-perceval
  2021-03-26 21:06:36,660 [git] collection finished for https://github.com/chaoss/grimoirelab-perceval
  2021-03-26 21:06:36,660 [git] collection phase finished in 00:00:01
  2021-03-26 21:06:36,724 Loading raw data finished!
  2021-03-26 21:06:36,725 Reading projects data from  ./projects.json 
  2021-03-26 21:06:46,835 [git] enrichment phase starts
  2021-03-26 21:06:46,910 [git] enrichment starts for https://github.com/chaoss/grimoirelab-perceval
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/relationships.py:3435: SAWarning: relationship 'Organization.domains_organizations' will copy column organizations.id to column domains_organizations.organization_id, which conflicts with relationship(s): 'Domain.organizations' (copies organizations.id to domains_organizations.organization_id), 'Organization.domains' (copies organizations.id to domains_organizations.organization_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning.
  util.warn(
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/relationships.py:3435: SAWarning: relationship 'Domain.organization' will copy column organizations.id to column domains_organizations.organization_id, which conflicts with relationship(s): 'Domain.organizations' (copies organizations.id to domains_organizations.organization_id), 'Organization.domains' (copies organizations.id to domains_organizations.organization_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning.
  util.warn(
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/relationships.py:3435: SAWarning: relationship 'UniqueIdentity.uuid_identy' will copy column uidentities.uuid to column identities.uuid, which conflicts with relationship(s): 'Identity.uidentities' (copies uidentities.uuid to identities.uuid), 'UniqueIdentity.identities' (copies uidentities.uuid to identities.uuid). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning.
  util.warn(
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/relationships.py:3435: SAWarning: relationship 'Identity.uidentity' will copy column uidentities.uuid to column identities.uuid, which conflicts with relationship(s): 'Identity.uidentities' (copies uidentities.uuid to identities.uuid), 'UniqueIdentity.identities' (copies uidentities.uuid to identities.uuid). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning.
  util.warn(
  2021-03-26 21:06:47,386 [sortinghat] Unknown exception adding identity. Ignoring it. sduenas@bitergia.com Santiago Dueñas None
  2021-03-26 21:06:47,455 Error enriching raw from git (https://github.com/chaoss/grimoirelab-perceval): SQL expression for ON clause expected, got <class 'sortinghat.db.model.Organization'>.
Traceback (most recent call last):
  File "/home/rohan/2work/sources/grimoirelab-elk/grimoire_elk/elk.py", line 530, in enrich_backend
    enrich_count = enrich_items(ocean_backend, enrich_backend)
  File "/home/rohan/2work/sources/grimoirelab-elk/grimoire_elk/elk.py", line 318, in enrich_items
    total = enrich_backend.enrich_items(ocean_backend)
  File "/home/rohan/2work/sources/grimoirelab-elk/grimoire_elk/enriched/git.py", line 468, in enrich_items
    rich_item = self.get_rich_item(item)
  File "/home/rohan/2work/sources/grimoirelab-elk/grimoire_elk/enriched/enrich.py", line 97, in decorator
    eitem = func(self, *args, **kwargs)
  File "/home/rohan/2work/sources/grimoirelab-elk/grimoire_elk/enriched/git.py", line 320, in get_rich_item
    eitem.update(self.get_item_sh(item, self.roles))
  File "/home/rohan/2work/sources/grimoirelab-elk/grimoire_elk/enriched/enrich.py", line 891, in get_item_sh
    sh_fields = self.get_item_sh_fields(identity, item_date, rol=rol)
  File "/home/rohan/2work/sources/grimoirelab-elk/grimoire_elk/enriched/enrich.py", line 802, in get_item_sh_fields
    eitem_sh[rol + "_org_name"] = self.get_enrollment(eitem_sh[rol + "_uuid"], item_date)
  File "/home/rohan/2work/sources/grimoirelab-elk/grimoire_elk/enriched/enrich.py", line 679, in get_enrollment
    enrollments = self.get_enrollments(uuid)
  File "/home/rohan/2work/sources/grimoirelab-elk/grimoire_elk/enriched/enrich.py", line 929, in get_enrollments
    return api.enrollments(self.sh_db, uuid)
  File "/home/rohan/2work/sources/grimoirelab-sortinghat/sortinghat/api.py", line 1221, in enrollments
    query = session.query(Enrollment).\
  File "<string>", line 2, in join
  File "/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/sql/base.py", line 96, in _generative
    x = fn(self, *args, **kw)
  File "<string>", line 2, in join
  File "/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/base.py", line 224, in generate
    fn(self, *args[1:], **kw)
  File "/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2315, in join
    joins_to_add = tuple(
  File "/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2324, in <genexpr>
    coercions.expect(roles.OnClauseRole, prop[1], legacy=True)
  File "/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/sql/coercions.py", line 175, in expect
    resolved = impl._literal_coercion(
  File "/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/sql/coercions.py", line 388, in _literal_coercion
    self._raise_for_expected(element, argname)
  File "/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/sql/coercions.py", line 270, in _raise_for_expected
    util.raise_(exc.ArgumentError(msg, code=code), replace_context=err)
  File "/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 198, in raise_
    raise exception
sqlalchemy.exc.ArgumentError: SQL expression for ON clause expected, got <class 'sortinghat.db.model.Organization'>.
  2021-03-26 21:06:47,462 [git] Done enrichment for https://github.com/chaoss/grimoirelab-perceval
  2021-03-26 21:06:47,462 [git] enrichment finished for https://github.com/chaoss/grimoirelab-perceval
  2021-03-26 21:06:47,463 [git] enrichment phase finished in 0:00:00
  2021-03-26 21:06:47,463 [git] data retention start
  2021-03-26 21:06:47,502 [git] data retention end
  2021-03-26 21:06:47,502 [git] identities retention end
  2021-03-26 21:06:47,503 [git] autorefresh start
  2021-03-26 21:06:47,591 [git] Refreshing identities
  2021-03-26 21:06:47,610 [git] autorefresh end
  2021-03-26 21:06:47,610 [git] studies phase start
  2021-03-26 21:06:49,809 [git] Executing studies ['enrich_demography:git', 'enrich_areas_of_code:git', 'enrich_onion:git']
  2021-03-26 21:06:49,809 [git] Starting study: enrich_demography:git, params {'date_field': 'utc_commit', 'author_field': 'author_uuid'}
  2021-03-26 21:06:49,809 [git] Demography starting study https://localhost:9200/git_chaoss_enriched
  2021-03-26 21:06:50,706 [git] Demography end https://localhost:9200/git_chaoss_enriched
  2021-03-26 21:06:50,707 [git] Starting study: enrich_areas_of_code:git, params {'in_index': 'git_chaoss', 'out_index': 'git-aoc_chaoss_enriched'}
  2021-03-26 21:06:50,707 [git] study areas_of_code Starting study - Input: git_chaoss Output: git-aoc_chaoss_enriched
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/elasticsearch/connection/http_requests.py:61: UserWarning: Connecting to https://localhost:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
  2021-03-26 21:06:50,735 GET https://localhost:9200/ [status:200 request:0.027s]
  2021-03-26 21:06:50,762 GET https://localhost:9200/ [status:200 request:0.027s]
  2021-03-26 21:06:50,766 HEAD https://localhost:9200/git-aoc_chaoss_enriched [status:200 request:0.004s]
  2021-03-26 21:06:50,767 [git] study areas_of_code Processing repo: https://github.com/chaoss/grimoirelab-perceval
  2021-03-26 21:06:50,773 GET https://localhost:9200/git-aoc_chaoss_enriched/_search [status:200 request:0.006s]
  2021-03-26 21:06:50,782 GET https://localhost:9200/git_chaoss/_search?scroll=300m&size=500 [status:200 request:0.008s]
  2021-03-26 21:06:50,786 GET https://localhost:9200/_search/scroll?scroll=300m [status:200 request:0.004s]
  2021-03-26 21:06:50,790 DELETE https://localhost:9200/_search/scroll [status:200 request:0.003s]
  2021-03-26 21:06:50,797 [git] Problem executing study enrich_areas_of_code:git, SQL expression for ON clause expected, got <class 'sortinghat.db.model.Organization'>.
  2021-03-26 21:06:50,797 SQL expression for ON clause expected, got <class 'sortinghat.db.model.Organization'>.

Process finished with exit code 255

@vchrombie , facing this error. Did I forget to do any step?

WasimAkhtarKhan commented 3 years ago

Maybe this would help you https://stackoverflow.com/questions/66009247/sqlalchemy-warning-for-many-to-many-relation-with-association-table @rohanreddych

WasimAkhtarKhan commented 3 years ago

I am facing this error @vchrombie after running ./manage.py makemigrations --settings=config.settings.devel

File "/home/wasim/.cache/pypoetry/virtualenvs/sortinghat-z07j1bw2-py3.8/lib/python3.8/site-packages/MySQLdb/connections.py", line 204, in init super(Connection, self).init(*args, **kwargs2) django.db.utils.OperationalError: (1045, "Access denied for user 'root'@'localhost' (using password: NO)")

I face this also after changing the password in safe mode,flushing privilegds and then running ./manage.py makemigrations --settings=config.settings.devel OperationalError: (2002, “Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)”)

When I execute systemctl status mysql.service It is giving: ● mysql.service - MySQL Community Server Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2021-03-27 11:05:54 IST; 2s ago Process: 9002 ExecStartPre=/usr/share/mysql/mysql-systemd-start pre (code=exited, status=0/SUCCESS) Main PID: 9010 (mysqld) Status: "Server is operational" Tasks: 38 (limit: 9382) Memory: 349.1M CGroup: /system.slice/mysql.service └─9010 /usr/sbin/mysqld lines 1-10

vchrombie commented 3 years ago

2021-03-26 21:06:47,386 [sortinghat] Unknown exception adding identity. Ignoring it. sduenas@bitergia.com Santiago Dueñas None 2021-03-26 21:06:47,455 Error enriching raw from git (https://github.com/chaoss/grimoirelab-perceval): SQL expression for ON clause expected, got <class 'sortinghat.db.model.Organization'>.

Hi @rohanreddych, looks like there is some issue with sortinghat. The error comes from here, sortinghat_gelk.py#L74. Do you have MySQL/MariaDB installed on your machine? If yes, then there should be something wrong with the configurations. Can we have a quick chat sometime later today? I'm free after 15:00 IST.

rohanreddych commented 3 years ago

Yes, I have mysql installed. I do sudo service mysql stop then do docker-compose up so that there wont be any errors in ports.

Can we have a quick chat sometime later today? I'm free after 15:00 IST.

How? IRC or matrix?

vchrombie commented 3 years ago

OperationalError: (2002, “Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)”)

Hi @WasimAkhtarKhan, it looks like a db connection issue. You have to update the config/settings/devel.py file with the password of the MySQL service. devel.py#L102.

vchrombie commented 3 years ago

How? IRC or matrix?

We can connect at #grimoirelab at Freenode. I will be there at 15:00 IST.

WasimAkhtarKhan commented 3 years ago

OperationalError: (2002, “Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)”)

Hi @WasimAkhtarKhan, it looks like a db connection issue. You have to update the config/settings/devel.py file with the password of the MySQL service. devel.py#L102.

Ok @vchrombie
I will do that Thanks

rohanreddych commented 3 years ago

Hello @vchrombie , I installed mariadb

rohan@rohan:~/work$ mysql -V
mysql  Ver 15.1 Distrib 10.4.18-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

I tried running micro.py --raw --enrich --cfg ./setup.cfg --backends git, sortinghat throwed a database not found error: test_sh database not found So, i created a database test_sh.

Tried running again, which gave the following error:

/home/rohan/2work/sources/venv/bin/python3.8 /home/rohan/2work/sources/grimoirelab-sirmordred/utils/micro.py --raw --enrich --cfg ./setup.cfg --backends git
  2021-03-27 18:25:50,352 Reading projects data from  ./projects.json 
  2021-03-27 18:25:50,352 [git] collection phase starts
  2021-03-27 18:25:50,352 [git] collection starts for https://github.com/chaoss/grimoirelab-perceval
Collection for git: starting...
  2021-03-27 18:25:50,576 Created index https://localhost:9200/git_chaoss
  2021-03-27 18:25:50,626 Alias {'alias': 'git-raw', 'index': 'git_chaoss'} created on https://localhost:9200/git_chaoss.
  2021-03-27 18:25:50,636 [git] Incremental from: None for https://github.com/chaoss/grimoirelab-perceval
  2021-03-27 18:25:50,636 Fetching latest commits: 'https://github.com/chaoss/grimoirelab-perceval' git repository
Collection for git: finished after 00:00:02 hours
  2021-03-27 18:25:52,501 Fetch process completed: 0 commits fetched
  2021-03-27 18:25:52,503 [git] Done collection for https://github.com/chaoss/grimoirelab-perceval
  2021-03-27 18:25:52,505 [git] collection finished for https://github.com/chaoss/grimoirelab-perceval
  2021-03-27 18:25:52,505 [git] collection phase finished in 00:00:02
  2021-03-27 18:25:52,610 Loading raw data finished!
  2021-03-27 18:25:52,610 Reading projects data from  ./projects.json 
  2021-03-27 18:26:02,757 [git] enrichment phase starts
  2021-03-27 18:26:03,042 Created index https://localhost:9200/git_chaoss_enriched
  2021-03-27 18:26:03,049 [git] enrichment starts for https://github.com/chaoss/grimoirelab-perceval
  2021-03-27 18:26:03,117 Alias {'alias': 'git', 'index': 'git_chaoss_enriched'} created on https://localhost:9200/git_chaoss_enriched.
  2021-03-27 18:26:03,131 Alias {'alias': 'git_author', 'index': 'git_chaoss_enriched'} created on https://localhost:9200/git_chaoss_enriched.
  2021-03-27 18:26:03,144 Alias {'alias': 'git_enrich', 'index': 'git_chaoss_enriched'} created on https://localhost:9200/git_chaoss_enriched.
  2021-03-27 18:26:03,158 Alias {'alias': 'affiliations', 'index': 'git_chaoss_enriched'} created on https://localhost:9200/git_chaoss_enriched.
  2021-03-27 18:26:03,174 Alias {'alias': 'all_enriched', 'index': 'git_chaoss_enriched'} created on https://localhost:9200/git_chaoss_enriched.
  2021-03-27 18:26:03,424 [git] Done enrichment for https://github.com/chaoss/grimoirelab-perceval
  2021-03-27 18:26:03,425 [git] enrichment finished for https://github.com/chaoss/grimoirelab-perceval
  2021-03-27 18:26:03,425 [git] enrichment phase finished in 0:00:00
  2021-03-27 18:26:03,426 [git] data retention start
  2021-03-27 18:26:03,460 [git] data retention end
  2021-03-27 18:26:03,461 [git] identities retention end
  2021-03-27 18:26:03,461 [git] autorefresh start
  2021-03-27 18:26:03,515 [git] Refreshing identities
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/relationships.py:3435: SAWarning: relationship 'Organization.domains_organizations' will copy column organizations.id to column domains_organizations.organization_id, which conflicts with relationship(s): 'Domain.organizations' (copies organizations.id to domains_organizations.organization_id), 'Organization.domains' (copies organizations.id to domains_organizations.organization_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning.
  util.warn(
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/relationships.py:3435: SAWarning: relationship 'Domain.organization' will copy column organizations.id to column domains_organizations.organization_id, which conflicts with relationship(s): 'Domain.organizations' (copies organizations.id to domains_organizations.organization_id), 'Organization.domains' (copies organizations.id to domains_organizations.organization_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning.
  util.warn(
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/relationships.py:3435: SAWarning: relationship 'UniqueIdentity.uuid_identy' will copy column uidentities.uuid to column identities.uuid, which conflicts with relationship(s): 'Identity.uidentities' (copies uidentities.uuid to identities.uuid), 'UniqueIdentity.identities' (copies uidentities.uuid to identities.uuid). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning.
  util.warn(
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/sqlalchemy/orm/relationships.py:3435: SAWarning: relationship 'Identity.uidentity' will copy column uidentities.uuid to column identities.uuid, which conflicts with relationship(s): 'Identity.uidentities' (copies uidentities.uuid to identities.uuid), 'UniqueIdentity.identities' (copies uidentities.uuid to identities.uuid). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning.
  util.warn(
  2021-03-27 18:26:03,551 [git] autorefresh end
  2021-03-27 18:26:03,552 [git] studies phase start
  2021-03-27 18:26:05,815 [git] Executing studies ['enrich_demography:git', 'enrich_areas_of_code:git', 'enrich_onion:git']
  2021-03-27 18:26:05,815 [git] Starting study: enrich_demography:git, params {'date_field': 'utc_commit', 'author_field': 'author_uuid'}
  2021-03-27 18:26:05,816 [git] Demography starting study https://localhost:9200/git_chaoss_enriched
  2021-03-27 18:26:05,875 [git] Demography Creating alias: demographics
  2021-03-27 18:26:05,908 Alias demographics created on https://localhost:9200/git_chaoss_enriched.
  2021-03-27 18:26:05,909 [git] Demography end https://localhost:9200/git_chaoss_enriched
  2021-03-27 18:26:05,909 [git] Starting study: enrich_areas_of_code:git, params {'in_index': 'git_chaoss', 'out_index': 'git-aoc_chaoss_enriched'}
  2021-03-27 18:26:05,909 [git] study areas_of_code Starting study - Input: git_chaoss Output: git-aoc_chaoss_enriched
/home/rohan/2work/sources/venv/lib/python3.8/site-packages/elasticsearch/connection/http_requests.py:61: UserWarning: Connecting to https://localhost:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
  2021-03-27 18:26:05,946 GET https://localhost:9200/ [status:200 request:0.035s]
  2021-03-27 18:26:05,981 GET https://localhost:9200/ [status:200 request:0.034s]
  2021-03-27 18:26:05,989 [git] study areas_of_code Creating out ES index
  2021-03-27 18:26:06,276 PUT https://localhost:9200/git-aoc_chaoss_enriched [status:200 request:0.286s]
  2021-03-27 18:26:06,277 [git] study areas_of_code Processing repo: https://github.com/chaoss/grimoirelab-perceval
  2021-03-27 18:26:06,303 GET https://localhost:9200/git-aoc_chaoss_enriched/_search [status:200 request:0.024s]
  2021-03-27 18:26:06,303 [git] study areas_of_code reading items since the beginning of times
  2021-03-27 18:26:06,332 GET https://localhost:9200/git_chaoss/_search?scroll=300m&size=500 [status:400 request:0.028s]
  2021-03-27 18:26:06,332 [git] Problem executing study enrich_areas_of_code:git, RequestError(400, 'search_phase_execution_exception', 'No mapping found for [metadata__timestamp] in order to sort on')
  2021-03-27 18:26:06,333 RequestError(400, 'search_phase_execution_exception', 'No mapping found for [metadata__timestamp] in order to sort on')

Process finished with exit code 255
vchrombie commented 3 years ago

Hi @rohanreddych, the logs look fine to me.

You can check if the indexes are created using curl. (I assume you are using the searchguard version, if not please change the es URL).

$ curl -XGET -k "https://admin:admin@localhost:9200/_alias?pretty"

You can execute the micro-mordred with the --panels flag to activate the panels task so that the dashboards are created and you can check them in kibiter http://localhost:5601.

micro.py --raw --enrich --panels --cfg ./setup.cfg --backends git
rohanreddych commented 3 years ago

https://github.com/chaoss/grimoirelab/issues/414#issuecomment-808320936

This error was caused by using wrong version of SQLAlchemey, please use >1.2 and <1.4

https://github.com/chaoss/grimoirelab/issues/414#issuecomment-808741547

This error can be resolved using this https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md#empty-index- .

rohanreddych commented 3 years ago

@sduenas , @evamillan, @mafesan Can you please explain a little more about the project? I was looking at the recommendations/engine.py currently, affiliations and profile matching are implemented, and there are plans to implement gender recommendations (#471). What other recommendations are to be implemented

sduenas commented 3 years ago

The main purpose of the idea is to have useful recommendations for the system.

The first problem we want to address is which recommendations we really need. Right now, we have those you mentioned but maybe they need to be reformulated. For example, the current recommendations for affiliations answer the question of "at what organization do these people work?" but maybe it's more useful to answer "who works for this organization?".

In the case of the individuals, the recommendations are based on two sets of individuals to try to answer the question "which individuals are the same from these two groups?". Maybe we can create a higher layer which only does that for those individuals that weren't found before, or for those that work for a certain company, or maybe for the new ones that are added to the database. Also, we can expand the way we find these recommendations. For example, now we do some basic matching like if the email address is the same, then the individuals are the same. The problem gets more complicated when a person changed from one job to another so their email addresses are different now.

The second problem is how to visualize all these information in the UI and how the user can accept those recommendations.

All of these should be part of your proposal. You have to keep in mind that the work will last only one month or so, so try to adapt your proposal for that. Don't try to include too many things. Try to be reasonable according to the effort and time restrictions. More doesn't need mean better.

vchrombie commented 3 years ago

Hi everyone, the student application period has started and the deadline is 13 April 2021, 18:00 UTC. GSoC Timeline

Please continue working on the proposal and complete as many microtasks as possible. Please let us know if you need any help with doubts or reviewing the microtasks. Thanks!

stevenkolawole commented 3 years ago

All of these should be part of your proposal. You have to keep in mind that the work will last only one month or so, so try to adapt your proposal for that.

Hello @sduenas,

According to the GSOC page, the program would hold for two months. 10 weeks, to be exact. But you said the project would last for a month. Is there a mixup at somewhere or something?

vchrombie commented 3 years ago

The program has changed a bit. I will post the brief timeline here, to avoid any confusion.

Student Application Period: March 29, 2021 - April 13, 2021 Application Review Period: April 13, 2021 - May 17, 2021 Student Projects Announced: May 17, 2021 Community Bonding: May 17, 2021 - June 7, 2021 Coding Period: June 7, 2021 - August 16, 2021 Mid Evaluations: July 12 - 16, 2021 Students Submit Code: August 16 - 23, 2021 Mentors Submit Final Evaluations: August 23 - 30, 2021

GSoC Timeline