Open pamfilos opened 7 months ago
Looks like the file that was downloaded was wrong and not intentional. Hindawi comes from API, so it means that not publisher uploaded the file, but we downloaded it.
The file without extension is the HMTL file of the article. We can read it by following url: https://www.hindawi.com/journals/ahep/2016/9258106/ Most likely it has differences, from the one in our repo, since the one we have downloaded is from 2019.
The API for files looks really similar, just subdomain is different: downloads https://downloads.hindawi.com/journals/ahep/2016/9258106.pdf
For me, looks like the API which was used for files download was incorrect, someone used https://www.hindawi.com/journals/ahep/2016/9258106/ instead of
https://downloads.hindawi.com/journals/ahep/2016/9258106.pdf or https://downloads.hindawi.com/journals/ahep/2016/9258106.xml
When and Why? I have a feeling that it happened 6 years ago, in this commit: https://github.com/SCOAP3/scoap3-next/blob/612d69f4dd40aadee6a26c158ad8d4f813e1fc2a/scoap3/modules/workflows/workflows/articles_upload.py#L214-L231
just later was added a step with building correct structure for attaching files: https://github.com/SCOAP3/scoap3-next/blob/b4703326a6041a371c9ab56fa7539709897653ec/scoap3/modules/workflows/workflows/articles_upload.py#L314-L338
https://repo.scoap3.org/records/17163