crossminer / scava

https://eclipse.org/scava/
Eclipse Public License 2.0
18 stars 13 forks source link

Threader doesn't generate unique ids for indexer #376

Closed creat89 closed 4 years ago

creat89 commented 4 years ago

The threader, for sources that need to be threaded, such as nntp, doesn't generate unique ids that are compatible with the indexer.

In the case a same source is read but on different time periods, some threads ids are going to be the same. As all of them start with one. Thus, a collision of threads will exist and some data will be mixed or overwritten.

This case must be checked in the case multiple tasks are set for one same project. Otherwise, the error will happen in Mongo DB. If this is the case, some modifications will need to be done in the transient metrics.

The thread ID in Ossmeter was unique enough as all the data was read, from the beginning of the source to analyze.

It might be necessary to create a unique ID that consists in the subject plus relative creation date. This might affect the index and the capacity of matching correctly threads, but it might be enough to avoid collisions.

I accept suggestions.

creat89 commented 4 years ago

I have modified the platform to use and generate in theory unique IDs not only for threads but also for articles. Yes, the IDs used for the articles were not unique either.

Furthermore, I notices that the IRC reader was not populating the correct object for the threader to work. This was also fixed during these changes.