We can see that origin would be changed in _fix_item method.
Example, if the origin value of origin is https://xxx:xxx@xxx.com, it would been changed to https://xxx.com.
However, the uuid is generated from the origin value of origin:
perceval/backend.py#L424
And if in the next time, we re-run perceval to get all commits (from-date = 1970-01-01) of the same repo but with different url https://xxx2:xxx2@xxx.com, there would be two docs in ES to store the same commit because of the different uuid.
What I want is that there is only one doc in ES to store the same commit of the unique repo (at least the value of origin after _fix_item).
I think we should try not to change the value of origin in the method _fix_item. But if we must to do that , we need to change the generating rule of uuid.
https://github.com/chaoss/grimoirelab-elk/blob/efd60e38a100d23979f068ff3ab8131fd88a81f6/grimoire_elk/raw/git.py#L66
We can see that
origin
would be changed in_fix_item
method. Example, if the origin value oforigin
ishttps://xxx:xxx@xxx.com
, it would been changed tohttps://xxx.com
. However, the uuid is generated from the origin value oforigin
: perceval/backend.py#L424And if in the next time, we re-run
perceval
to get all commits (from-date = 1970-01-01
) of the same repo but with different urlhttps://xxx2:xxx2@xxx.com
, there would be two docs in ES to store the same commit because of the differentuuid
.What I want is that there is only one doc in ES to store the same commit of the unique repo (at least the value of
origin
after_fix_item
).