cern-sis / issues-scoap3

0 stars 0 forks source link

How to handle articles with authors with no affiliation? #172

Closed agentilb closed 11 months ago

agentilb commented 1 year ago

It happens rarely that authors don't have affiliation. But the affiliation field is mandatory. It is the case for 10.1016/j.nuclphysb.2023.116279 which is Halted status because of the missing affiliation. Are there similar cases already in the repo? Do we have a specific value we use in the affiliated field?

ErnestaP commented 1 year ago

No, we don't have any specific value for affiliations, if they don't exist.

In some records, we don't have any affiliations, these records might be changed/added manually We have 73 records without affiliations: The following data is presented in the format: (record_id: DOI), from the oldest (the year of 2014) record to the newest one (the year of 2023).

[(654, 10.1016/j.nuclphysb.2013.12.014'),
 (2444, 10.1016/j.physletb.2014.05.011'),
 (2997, 10.1016/j.physletb.2014.06.040'),
 (3196, 10.1016/j.physletb.2014.05.052'),
 (4192, 10.1088/1475-7516/2014/09/051'),
 (5155, 10.1016/j.physletb.2014.12.020'),
 (8864, 10.1155/2015/893920'),
 (9272, 10.1016/j.physletb.2015.02.023'),
 (9990, 10.1155/2015/975023'),
 (10833, 10.1016/j.physletb.2015.06.058'),
 (12818, 10.1016/j.physletb.2015.11.065'),
 (15122, 10.1016/j.physletb.2016.04.010'),
 (15266, 10.1016/j.nuclphysb.2016.04.024'),
 (15535, 10.1016/j.nuclphysb.2016.04.046'),
 (16212, 10.1088/1674-1137/40/7/073101'),
 (16472, 10.1016/j.nuclphysb.2016.07.013'),
 (17015, 10.1088/1674-1137/40/9/093102'),
 (17648, 10.1016/j.physletb.2016.10.004'),
 (17735, 10.1016/j.nuclphysb.2016.08.023'),
 (18590, 10.1088/1674-1137/41/1/013104'),
 (19069, 10.1088/1674-1137/41/2/023101'),
 (19625, 10.1088/1674-1137/41/4/043103'),
 (20008, 10.1088/1674-1137/41/5/053101'),
 (20440, 10.1088/1674-1137/41/6/063103'),
 (20989, 10.1088/1674-1137/41/8/083107'),
 (20992, 10.1088/1674-1137/41/8/083103'),
 (21407, 10.1016/j.physletb.2017.08.076'),
 (21435, 10.1088/1674-1137/41/9/094104'),
 (21896, 10.1016/j.physletb.2017.09.084'),
 (22903, 10.1016/j.nuclphysb.2017.12.019'),
 (40962, 10.1016/j.physletb.2018.06.057'),
 (43706, 10.1103/PhysRevD.98.096012'),
 (48902, 10.1103/PhysRevLett.123.059902'),
 (49101, 10.1093/ptep/ptz061'),
 (54869, 10.1103/PhysRevC.101.064905'),
 (60073, 10.1093/ptep/ptaa175'),
 (67452, 10.1088/1674-1137/ac0b3b'),
 (73304, 10.1093/ptep/ptac123'),
 (75175, 10.1103/PhysRevLett.130.031901'),
 (75248, 10.1103/PhysRevD.107.012003'),
 (75330, 10.1103/PhysRevD.107.012006'),
 (75582, 10.1103/PhysRevD.107.032001'),
 (75611, 10.1103/PhysRevD.107.033003'),
 (75737, 10.1103/PhysRevD.107.032003'),
 (75764, 10.1103/PhysRevD.107.032004'),
 (75766, 10.1103/PhysRevD.107.032005'),
 (75824, 10.1103/PhysRevLett.130.071802'),
 (75825, 10.1103/PhysRevD.107.L031101'),
 (75853, 10.1103/PhysRevLett.130.071804'),
 (75900, 10.1103/PhysRevD.107.032008'),
 (75914, 10.1103/PhysRevD.107.L031103'),
 (75957, 10.1103/PhysRevD.107.L031102'),
 (76074, 10.1103/PhysRevD.107.032013'),
 (76183, 10.1103/PhysRevLett.130.091902'),
 (76242, 10.1103/PhysRevD.107.052001'),
 (76297, 10.1103/PhysRevD.107.052004'),
 (76615, 10.1103/PhysRevD.107.052009'),
 (76726, 10.1103/PhysRevD.107.L051101'),
 (76850, 10.1103/PhysRevD.107.072001'),
 (76871, 10.1103/PhysRevD.107.072002'),
 (76969, 10.1088/1674-1137/acc3f4'),
 (76987, 10.1103/PhysRevLett.130.151903'),
 (77316, 10.1103/PhysRevD.107.072008'),
 (77360, 10.1103/PhysRevLett.130.181803'),
 (77363, 10.1103/PhysRevLett.130.181804'),
 (77405, 10.1103/PhysRevD.107.092001'),
 (77500, 10.1103/PhysRevD.107.092003'),
 (77637, 10.1103/PhysRevD.107.L091102'),
 (78121, 10.1103/PhysRevLett.130.231801'),
 (78594, 10.1103/PhysRevLett.130.261802'),
 (78595, 10.1103/PhysRevD.107.112009'),
 (78596, 10.1103/PhysRevD.107.112011'),
 (78615, 10.1103/PhysRevD.107.112010')]

For DEVELOPERS: Affiliations assigning:

APS :https://github.com/SCOAP3/hepcrawl/blob/master/hepcrawl/extractors/aps_parser.py#L110 Hindawi: https://github.com/SCOAP3/hepcrawl/blob/master/hepcrawl/extractors/hindawi_parser.py#L67C8-L67C8 IOP: https://github.com/SCOAP3/hepcrawl/blob/master/hepcrawl/extractors/jats.py#L95C6-L95C6 OUP: https://github.com/SCOAP3/hepcrawl/blob/master/hepcrawl/extractors/jats.py#L95 Elsevier: https://github.com/SCOAP3/hepcrawl/blob/master/hepcrawl/extractors/s3_elsevier_parser.py#L187 Springer: https://github.com/SCOAP3/hepcrawl/blob/master/hepcrawl/extractors/s3_springer_parser.py#L155