Closed agentilb closed 5 months ago
Hi @agentilb , any update on this? :)
I harvested articles from 2023-11-01 to 2023-11-30 on the 6th of Dec
Hi @ErnestaP,
It seems that 3 articles from the list are still missing:
10.1103/PhysRevD.108.092007 10.1103/PhysRevLett.131.221401 10.1103/PhysRevLett.131.221802
Could you check why?
For 2 and 3, the dates are the publication dates, and they seem to be correct. Is it possible to check the API for the days after, to check if the articles are not there?
sure, I will keep you informed :)
Could you also try to reharvest those articles, they had this duplicate affiliation issue, but they should have been corrected by APS now.
10.1103/PhysRevLett.131.091802 -> 29 August 2023 10.1103/PhysRevLett.131.071901 -> 14 August 2023 10.1103/PhysRevLett.131.091901 -> 29 August 2023 10.1103/PhysRevD.108.012021 -> 28 July 2023 10.1103/PhysRevD.108.012023 -> 28 July 2023 10.1103/PhysRevLett.131.111802 -> 13 September 2023 10.1103/PhysRevD.108.012007 -> 14 July 2023 10.1103/PhysRevD.108.012008 -> 13 July 2023
(I have added the publication date).
Thanks in advance!
10.1103/PhysRevD.108.092007 - in the repo https://repo.scoap3.org/records/82263 10.1103/PhysRevLett.131.221401 - in the repo https://repo.scoap3.org/records/82262 10.1103/PhysRevLett.131.221802 - I was not able to find :/ I will try again after checking the other articles
halted: 10.1103/PhysRevLett.131.111802 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D1%26flt0_21%3D2&id=1a802c32-9e69-11ee-a824-029ec3c926e4
10.1103/PhysRevD.108.012007 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D2%26flt0_21%3D2&id=199c756e-9e69-11ee-89d9-8625598c2545
10.1103/PhysRevD.108.012008 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D2%26flt0_21%3D2&id=1826b21c-9e69-11ee-aaec-663d42099f7a
I was not able to find: 10.1103/PhysRevLett.131.071901 -> 14 August 2023 10.1103/PhysRevLett.131.091901 -> 29 August 2023 10.1103/PhysRevD.108.012021 -> 28 July 2023 10.1103/PhysRevD.108.012023 -> 28 July 2023 10.1103/PhysRevLett.131.111802 -> 13 September 2023
The 3 first articles are now halted because of the arXiv category, together with the 250 others that were just reharvested.. Do you think it will be possible to clean this? This time, I cannot do it manually... It is weird though that this problem seems to appear mostly for APS articles.
The ones you cannot find were actually already halted, does it help to find them again?
10.1103/PhysRevLett.131.071901 -> 14 August 2023 (2023-10-21 00:30:24.429612) 10.1103/PhysRevLett.131.091901 -> 29 August 2023 (2023-10-21 00:30:24.224975) 10.1103/PhysRevD.108.012021 -> 28 July 2023 (2023-09-22 00:30:18.400387) 10.1103/PhysRevD.108.012023 -> 28 July 2023 (2023-09-22 00:30:17.894814) 10.1103/PhysRevLett.131.091802 -> 29 August 2023 (2023-10-21 00:30:24.771379)
can you please add the links to the halted articles? Maybe they can be fixed manually?
yes, for halted article is just APS. I had to harvest them by the date range of the articles you sent me before: from July 13 to September 13, that's why are so many, I cannot do much about them now, I am afraid
I have tried to modify this one: 10.1103/PhysRevLett.131.091802 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=06b41ce8-6fa9-11ee-9bfc-6280630c062e Now this is in ERROR mode...
Here are the other links:
10.1103/PhysRevD.108.012023 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=349ff9b4-58df-11ee-b586-aa2b8beba377
10.1103/PhysRevD.108.012021 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=34eced00-58df-11ee-963b-e28678992ea2
10.1103/PhysRevLett.131.091901 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=066044ba-6fa9-11ee-aa90-f67e7dd5bb4f
10.1103/PhysRevLett.131.071901 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=067ff328-6fa9-11ee-b33e-eeb6bc60d2db
yes, for halted article is just APS. I had to harvest them by the date range of the articles you sent me before: from July 13 to September 13, that's why are so many, I cannot do much about them now, I am afraid
Most of those were already ok in the repo. Can we clean the halted records that were already in the repo?
Looks like the issue is always with the same author:
10.1103/PhysRevLett.131.091802 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=06b41ce8-6fa9-11ee-9bfc-6280630c062e --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevLett.131.091802
10.1103/PhysRevD.108.012023 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=349ff9b4-58df-11ee-b586-aa2b8beba377 --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevD.108.012023
10.1103/PhysRevD.108.012021 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=34eced00-58df-11ee-963b-e28678992ea2 --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevD.108.012021
10.1103/PhysRevLett.131.091901 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=066044ba-6fa9-11ee-aa90-f67e7dd5bb4f --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevLett.131.091901
10.1103/PhysRevLett.131.071901 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=067ff328-6fa9-11ee-b33e-eeb6bc60d2db --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevLett.131.071901
the APS Halted articles are cleaned :)
Thanks a lot Ernesta. It seems there is still this one to be cleaned: https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fflt0_21%3D2&id=154eff4a-9e69-11ee-9651-46aa8dc5dde2 The article is in the repo.
For the other articles listed above, did you try to reharvest them or not?
I think you understood me wrong, all of these articles don't have affiliationid for author {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"}: 10.1103/PhysRevLett.131.091802 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=06b41ce8-6fa9-11ee-9bfc-6280630c062e --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevLett.131.091802
10.1103/PhysRevD.108.012023 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=349ff9b4-58df-11ee-b586-aa2b8beba377 --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevD.108.012023
10.1103/PhysRevD.108.012021 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=34eced00-58df-11ee-963b-e28678992ea2 --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevD.108.012021
10.1103/PhysRevLett.131.091901 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=066044ba-6fa9-11ee-aa90-f67e7dd5bb4f --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevLett.131.091901
10.1103/PhysRevLett.131.071901 https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fpage%3D13%26flt0_21%3D2&id=067ff328-6fa9-11ee-b33e-eeb6bc60d2db --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"} ---original data from APS: https://harvest.aps.org/v2/journals/articles/10.1103/PhysRevLett.131.071901
Also the one you gave me now has the same issue: https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fflt0_21%3D2&id=154eff4a-9e69-11ee-9651-46aa8dc5dde2 They don't have to be cleaned. APS have to be contacted and these articles have to be fixed.
For the other articles listed above, did you try to reharvest them or not? No, we found another way, I restarted the workflows of these articles
Ah yes, sorry, actually in this case, it means that the author really doesn't have affiliation. It happens from time to time. We had a case recently, but I'm not sure how we handled it. Can we force the workflow?
And for this one: https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fflt0_21%3D2&id=154eff4a-9e69-11ee-9651-46aa8dc5dde2
It is already in the repo (with no affiliation for this author): https://repo.scoap3.org/records/80076
Interesting, I cannot tell how this article even appears in the repo. I will ask Harris tomorrow, maybe he has an idea
I found a few more from a task https://github.com/cern-sis/issues-scoap3/issues/187, I restarted the workflows for these articles, they were in error stated because of duplicated affiliations. Duplication error was fixed, but no affiliation error now is present: Same issue, same author --- there is no affiliationID: {"type":"Person","name":"A. Bizzeti","firstname":"A.","surname":"Bizzeti"}
https://repo.scoap3.org/admin/workflow/edit/?id=9be85834-21dd-11ee-93d5-c20792d59997 https://repo.scoap3.org/admin/workflow/edit/?id=1a87662e-2da7-11ee-9614-4ee3c1f0aa14 https://repo.scoap3.org/admin/workflow/edit/?id=19d9aaf2-2da7-11ee-b33e-eeb6bc60d2db
Can we force the harvest for those articles? There is nothing we can do for authors with no affiliation.
@agentilb this will break the new data model as we don't accept authors without affiliations. I can see 4 options:
let me know what do you think.
1 and 2 are out of the table. 3 is an option, but indeed, it is risky for the metadata. Remains 4. But probably we should still monitor this to distinguish the cases where there is a problem in the metadata and the genuine cases. I'm not sure how to handle it though. I'm sure we already have this in the repo, so the data model allowed it before. https://github.com/cern-sis/issues-scoap3/issues/172
Yes they were allowed for unknown reasons, some of the ~90 records are pretty old 2015, 2016 etc and my only guess is that validation step has been skipped manually or a schema changed after.
As far as I understand these cases are valid and we should allow them right? So I would suggest to modify the (new) data model and add a compliance check to mark the articles without affiliations. WDYT?
It is good idea, it could be a compliance criteria so we can find them and check if this is valid or not. But the overall compliance status of the article should be ok even if this compliance criteria is not met, otherwise those articles will never be compliant
ok thanks, @agentilb could you please include it to your feedback for the new system?
@ErnestaP let's skip the validation step for now and allow to create the record
Some of the articles after skipping to the next step, jumped back to the double affiliation issue, and when I restarted the workflow, it jumped back to the author affiliation issue. I am afraid we need to reharvest them, in order to have a single affiliation instead of duplication and then skip the validation
Are in the repo: 10.1103/PhysRevLett.131.091802 https://repo.scoap3.org/records/82317
10.1103/PhysRevLett.131.091901 https://repo.scoap3.org/records/82318
10.1103/PhysRevLett.131.071901 https://repo.scoap3.org/records/82319
10.1103/PhysRevD.108.012023 https://repo.scoap3.org/records/82321
10.1103/PhysRevD.108.012021 https://repo.scoap3.org/records/82322
Also, articles from the previously mentioned task are in the repo: 10.1103/PhysRevD.108.012007 https://repo.scoap3.org/records/82285
10.1103/PhysRevD.108.012008 https://repo.scoap3.org/records/82283
10.1103/PhysRevD.108.012021 https://repo.scoap3.org/records/82322
10.1103/PhysRevD.108.012023 https://repo.scoap3.org/records/82321
Can we close the issue regarding APS?
@agentilb found Elsevier articles as well without affiliations, harvested today. Can you please check? https://github.com/cern-sis/issues-scoap3/issues/268
I realise there is a last article that seems to be missing in the repo: 10.1103/PhysRevLett.131.061901 but the publisher claims it was corrected. Could you please try to re-harvest it as you did for the others? Then, we should be ok with APS!
It is currently in halted mode: https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fflt0_21%3D2&id=eda60a28-9423-11ee-b746-2ef1f4d60b68
@agentilb I manage to get the article is in the repo: https://repo.scoap3.org/records/82573
I think everything is in order with this ticket.
It seems there was the same problem as for Inspire with APS. We need to reharvest all those articles. (see attached).
APS HEP November 2023 Articles Missing from CERN Repository 12042023[57].xlsx