Closed sat01a closed 8 years ago
@javier-molina I have a feeling this happened because the project got deleted from mongo but did not delete effectively from elastic search. This explains why same project appeared twice. As a solution, what do you think of deleting all scistarter projects from homepage index before creating scistarter projects. This way we can be sure all projects are deleted.
After finishing this analysis I came across to the other scenario I was suspecting. The search for all projects call to SciStarters returns duplicates some times, so I think we need to make sure we process the same Id only once.
The other scenario were mongo and ES are out of sync is more an environment issue and can be easily solved by a full reindex, I wouldn't try to code anything for this scenario.
I will take this ticket back to in progress to address the first scenario.
It was interesting to see duplicate entries in mongo for SciStarter projects, the only case, I can think of is that SciStarter actually returned duplicate records as part of
https://scistarter.com/finder?format=json&q= call
The other scenario were this can happen is when the Async reindexing is stopped once the SciStarter projects have been deleted, that could be the case of a sudden crash of ecodata.
Rerunning the sciStarter import or reindexing fix the issue though.
No code changes required for this.
On a related note, from the original 73 projects, 38 are ingested, 32 are mark as coming from ALA, and 3 are not reported by SciStarter Finder API call.
The original project ids are:
1569,1480,1400,1378,1368,1318,1313,1303,1245,1205,1146,997,988,987,931,920,917,874,870,869,864,854,850,849,842,828,797,795,791,764,704,689,687,681,647,645,644,643,621,615,614,600,587,582,567,564,554,472,471,446,431,423,416,413,411,403,388,371,345,338,336,334,288,280,168,136,53,44,42,33,32,27,26
Projects listed as coming from ALA:
These projects are not coming from finder API but can be retrieved directly [1569, 582, 371]
@pbrenton probably you want to update the list of projects coming from SciStarter.