chaoss / grimoirelab-elk

GNU General Public License v3.0
58 stars 121 forks source link

[Git] `origin` has been changed in `_fix_item` method #1016

Open xiao623 opened 2 years ago

xiao623 commented 2 years ago

https://github.com/chaoss/grimoirelab-elk/blob/efd60e38a100d23979f068ff3ab8131fd88a81f6/grimoire_elk/raw/git.py#L66

We can see that origin would be changed in _fix_item method. Example, if the origin value of origin is https://xxx:xxx@xxx.com, it would been changed to https://xxx.com. However, the uuid is generated from the origin value of origin: perceval/backend.py#L424

            'uuid': uuid(self.origin, self.metadata_id(item)),

And if in the next time, we re-run perceval to get all commits (from-date = 1970-01-01) of the same repo but with different url https://xxx2:xxx2@xxx.com, there would be two docs in ES to store the same commit because of the different uuid.

What I want is that there is only one doc in ES to store the same commit of the unique repo (at least the value of origin after _fix_item).

xiao623 commented 2 years ago

I think we should try not to change the value of origin in the method _fix_item. But if we must to do that , we need to change the generating rule of uuid.