Refactoring Enrich class

imnitishng commented 4 years ago

Enrich class currently performs a lot of things which can be refactored into separate modules. This issue will serve as the main discussion channel for the objective.

imnitishng commented 4 years ago

Hi @valeriocos, I went through the code having the objective in mind. A starting point I could think of is creating a new class for handling Identities, The class would contain functions listed below -

class Identities():
- get_identities()
- get_field_author()
- has_identities()
- get_sh_identity()
- __get_item_sh_fields_empty()
- get_item_no_sh_fields()
- get_uuid_from_id()
- __get_sh_ids_cache()

and other identity related method for different backends.

Each backend class will also act as a subclass derived from this class, similar to the Enrich class. Would like to know your and other community members thoughts and ideas on this objective.

valeriocos commented 4 years ago

Hi @imnitishng , thank you for opening this issue.

Would be possible to align your proposal to the identities module recently added (https://github.com/chaoss/grimoirelab-elk/tree/master/grimoire_elk/identities)?

Would be possible to target the data anonymization as part of the functions/methods you listed?

Thanks

imnitishng commented 4 years ago

Hi @valeriocos, I did some changes, moved methods and ran tests for github backend, they seem to work fine. Here is the branch please have a look at it here https://github.com/imnitishng/grimoirelab-elk/tree/identities_test/grimoire_elk/identities

I believe this can act as a starting point, I will start testing the idea for all the ELK backends to ensure nothing breaks after you approve this.

Would be possible to target the data anonymization as part of the functions/methods you listed?

Yes anonymization is possible, but we might need a separate identites file for each backend like https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/identities/github.py since for each backend anonymization of data will act differently.

The can also include functions from backend enricher classes like get_sh_identity() , get_identities(), etc, that are related to item identities. So the basic idea being that all the data enrichment (creating new enriched attributes, copying raw attributres) happens under grimoire_elk/enriched and all the identity related changes (extracting identity data, user email domains, creating sortinghat entries) happens under grimoire_elk/identities.

Would like to hear your opinion about the idea. Thanks.

valeriocos commented 4 years ago

Hi @imnitishng

I did some changes, moved methods and ran tests for github backend, they seem to work fine. Here is the branch please have a look at it here https://github.com/imnitishng/grimoirelab-elk/tree/identities_test/grimoire_elk/identities

Thank you for sharing the branch. The approach looks good! Please consider to include tests for each commit to ease the review process.

Yes anonymization is possible, but we might need a separate identites file for each backend like https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/identities/github.py since for each backend anonymization of data will act differently.

Perfect! I understand we can extend https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/identities/github.py and apply the same approach to the other backends.

The can also include functions from backend enricher classes like get_sh_identity() , get_identities(), etc, that are related to item identities. So the basic idea being that all the data enrichment (creating new enriched attributes, copying raw attributres) happens under grimoire_elk/enriched and all the identity related changes (extracting identity data, user email domains, creating sortinghat entries) happens under grimoire_elk/identities.

Yes, exactly, that's the goal of this refactoring!

Would like to hear your opinion about the idea. Thanks.

The idea looks promising and we are on the same page wrt the next steps ^^. Thank you !

imnitishng commented 4 years ago

Thank you @valeriocos for the review. So I will start on with the work ASAP. I will be creating a PR soon with separate commits for each backend and their respective tests. A similar idea can be applied to move studies out of the Enrich class, we can tackle it in further iterations.

valeriocos commented 4 years ago

Thank you @imnitishng !

So I will start on with the work ASAP.

Perfect, take all the time you need

A similar idea can be applied to move studies out of the Enrich class, we can tackle it in further iterations.

Yes! :)

jjmerchante commented 1 year ago

Closing this due to inactivity.

chaoss / grimoirelab-elk

Refactoring Enrich class #877