cern-sis / issues-scoap3

0 stars 0 forks source link

Elsevier articles not in the repo #249

Closed agentilb closed 6 months ago

agentilb commented 7 months ago

Could you check why?

10.1016/j.physletb.2023.138271 10.1016/j.physletb.2023.138272 10.1016/j.physletb.2023.138273 10.1016/j.physletb.2023.138274 10.1016/j.physletb.2023.138275 10.1016/j.physletb.2023.138277 10.1016/j.physletb.2023.138278 10.1016/j.physletb.2023.138279 10.1016/j.physletb.2023.138280 10.1016/j.physletb.2023.138281 10.1016/j.physletb.2023.138282 10.1016/j.physletb.2023.138283 10.1016/j.physletb.2023.138285 10.1016/j.physletb.2023.138287 10.1016/j.physletb.2023.138288 10.1016/j.physletb.2023.138289 10.1016/j.physletb.2023.138263 10.1016/j.physletb.2023.138265 10.1016/j.physletb.2023.138266 10.1016/j.physletb.2023.138245

ErnestaP commented 7 months ago

Just a note for DEVELOPERS: I see that we have an article 10.1016/j.physletb.2023.138271 in our pods here: /data/harvesting/Elsevier/unpacked/CERNAB00000010814_SBdGkA/CERNAB00000010814/03702693/v847sC/S0370269323006056/main.xml

I see that it was uploaded at 11:42 on the 6th of Nov: Screenshot 2023-12-01 at 15 37 45

In completed workflows, I cannot see any records of this timing Screenshot 2023-12-01 at 15 39 09

Same as in error or halted states.

I cannot see logs older than the 30th of November in scoap3 crawler job list. Also, logs in the pod reach until the 30th on November Screenshot 2023-12-01 at 15 44 54

ErnestaP commented 6 months ago

in CERNAB00000010814.zip ERROR: Not found referenced affiliations ([]) Articles: '10.1016/j.physletb.2023.138289' '10.1016/j.physletb.2023.138288' '10.1016/j.physletb.2023.138266' '10.1016/j.nuclphysb.2023.116378'

ErnestaP commented 6 months ago

I managed to get articles on prod, except one: 10.1016/j.physletb.2023.138245 It looks like it was never sent. Also, we have an issue with one of the files, one of XMLs is missing namespaces in it: how it should be:

<article xmlns="http://www.elsevier.com/xml/ja/dtd"
    xmlns:ce="http://www.elsevier.com/xml/common/dtd"
    xmlns:sa="http://www.elsevier.com/xml/common/struct-aff/dtd"
    xmlns:sb="http://www.elsevier.com/xml/common/struct-bib/dtd"
    xmlns:xlink="http://www.w3.org/1999/xlink" docsubtype="sco" xml:lang="en">
    <item-info>
        <jid>PLB</jid>
        <aid>138266</aid>
        <ce:article-number>138266</ce:article-number>
        <ce:pii>S0370-2693(23)00600-7</ce:pii>
        <ce:doi>10.1016/j.physletb.2023.138268</ce:doi>
        <ce:copyright year="2023" type="other">The Author(s)</ce:copyright>
        <ce:doctopics>
            <ce:doctopic id="doc0010">
                <ce:text>Theory</ce:text>
            </ce:doctopic>
        </ce:doctopics>
    </item-info>

how it is:

<article docsubtype="sco" xml:lang="en">
    <item-info>
        <jid>PLB</jid>
        <aid>138268</aid>
        <ce:article-number>138268</ce:article-number>
        <ce:pii>S0370-2693(23)00602-0</ce:pii>
        <ce:doi>10.1016/j.physletb.2023.138268</ce:doi>
        <ce:copyright year="2023" type="other">Oak Ridge National Laboratory</ce:copyright>
        <ce:doctopics>
            <ce:doctopic id="doc0010">
                <ce:text>Experiments</ce:text>
            </ce:doctopic>
        </ce:doctopics>
    </item-info>

The path of file in the file structure Elsevier sent to us:

CERNAB00000010814/03702693/v847sC/S0370269323006020/main.xml

So, in the end, the record was not updated: https://repo.scoap3.org/records/81089

agentilb commented 6 months ago

Thank you Ernesta, so I will contact Elsevier for: 10.1016/j.physletb.2023.138245 which is in halted mode because of duplicate affiliation

And for: https://repo.scoap3.org/records/81089 Which has an incorrect xml file which generates an error.

It seems we have the same problem with those articles that should have been harvested in November: 10.1016/j.physletb.2023.138249 10.1016/j.physletb.2023.138290 10.1016/j.physletb.2023.138291 10.1016/j.physletb.2023.138292 10.1016/j.physletb.2023.138295 10.1016/j.physletb.2023.138296 10.1016/j.physletb.2023.138297 10.1016/j.physletb.2023.138298 10.1016/j.physletb.2023.138299 10.1016/j.physletb.2023.138300 10.1016/j.physletb.2023.138301 10.1016/j.physletb.2023.138302 10.1016/j.physletb.2023.138303 10.1016/j.physletb.2023.138305 10.1016/j.physletb.2023.138306 10.1016/j.physletb.2023.138307 10.1016/j.physletb.2023.138309 10.1016/j.physletb.2023.138310 10.1016/j.physletb.2023.138311 10.1016/j.physletb.2023.138312 10.1016/j.physletb.2023.138313 10.1016/j.physletb.2023.138314 10.1016/j.physletb.2023.138315 10.1016/j.physletb.2023.138316 10.1016/j.physletb.2023.138317 10.1016/j.physletb.2023.138318 10.1016/j.physletb.2023.138320 10.1016/j.physletb.2023.138321 10.1016/j.physletb.2023.138323 10.1016/j.physletb.2023.138325

Could you check them as well?

ErnestaP commented 6 months ago

All the articles above are reharvested. The problem is the same article as in the previous harvesting:

CERNAB00000010814/03702693/v847sC/S0370269323006020/main.xml
ErnestaP commented 6 months ago

@agentilb found the missing article: 10.1016/j.physletb.2023.138245 It's in a halted state because of duplicated affiliations: https://repo.scoap3.org/admin/workflow/details/?url=%2Fadmin%2Fworkflow%2F%3Fflt0_21%3D2&id=bdc8bcc4-9b3f-11ee-9333-c6debf016353

agentilb commented 6 months ago

Yes @ErnestaP, I already contacted Elsevier for this article. I'm now waiting for their answer.

But it seems this one is still in halted mode: 10.1016/j.physletb.2022.137649 It was due to the previous problem with the address line that was corrected some weeks ago I believe. Could you please try to re-harvest it?

ErnestaP commented 6 months ago

yes, I see, because we never reharvested it. I will do it :)

ErnestaP commented 6 months ago

Article is in the repo: https://repo.scoap3.org/records/82257

ErnestaP commented 6 months ago

@agentilb Can we close the issue?