Closed agentilb closed 6 months ago
@agentilb the country share script results from production. Please take a look, and let us know if is everything as expected.
Russia as a country is on the list because we have an affiliation JINR, which is Russian, so it appears in the results.
Also, you will see that some records have an UNKNOWN country which means, that author doesn't have affiliation info.
Hi @ErnestaP,
Thanks a lot for this!
What criteria did you take for the selection of articles? If I check in the repo, I find 15031 records: https://repo.scoap3.org/search?page=1&size=20&q=&year=2021--2022 and in the csv file, there are only 11925 articles.
Also, this is perfectly normal that we have Russian authors. The new rule applies only to a specific case: Russian authors within a collaboration and identified with the specific string in the affiliation field.
Thank you for checking! I run with --from_year 2021 --to_year 2022, for production. I need to look more closely at why is it like this
Hi @agentilb ! I found out why it was not all records collected: The names of countries in GDP files were slightly different, in comparison with the file from the previous year. It means that some countries were not found in the list and were skipped. The issue is fixed. Uploading the new file. Let me know if the result is as expected :)
Hi @ErnestaP
Thanks a lot! It seems the number of articles is fine now. However, the number of UNKNOWN affiliations seems to be really high.
I have checked a few, I think something is wrong: check https://repo.scoap3.org/records/72513 (10.1007/JHEP09(2022)048 Author is from Korea, and in the file, it is marked as UNKNOWN.
I see also that the column for USA is empty, so there is something that doesn't work.
Could you please check again?
Thanks,
Anne
Hi @agentilb , Thank you a lot for checking. There was a slight issue with mapping, since the new GDP file has different countries' names, in comparison with the last script run. For example: Turkey -> Turkiye
Attaching the latest script result: results.csv
Ernesta
Thank you Ernesta, I have checked the data and they look coherent! Thanks again!
@agentilb can we close this issue?
Hi,
I would need the same analyse done here: https://github.com/cern-sis/issues-scoap3/issues/72
But on 2021/2022 data.
If possible, I would need the data the last week of August.
You can take the 2022 GDP data from here: API_NY.GDP.MKTP.CD_DS2_en_csv_v2_5728855.csv