Closed physikerwelt closed 8 months ago
We should probably check the validity of IDs in a systematic fashion, not just for swMATH. Whether that should always lead to deletion is not clear to me at this point.
This is a first non-exhaustive list of software items that currently have an invalid swMATH id. I've encountered them when working on disambiguating only CRAN packages (that is why I did not check the entire list of items).
I can try to check this for the entire list of swMATH items but this does not seem scalable if it has to be periodically repeated. Does swMATH publish some sort of deletion list?
invalid_software.csv I am afraid it was never published. Do you know how to find the related QIds and delete the corresponding entries?
We should probably check the validity of IDs in a systematic fashion, not just for swMATH. Whether that should always lead to deletion is not clear to me at this point.
Yes, all external links should be verified. For links to git repositories, we try to deposit them in software heritage. For others, we could at least check the HTTP return status code.
However, the focus of this task is to delete invalid entries before they are merged with valid cran packages. If we perform the tasks in the opposite order, we risk to delete valid cran entries.
invalid_software.csv Here the QIDs for the invalid software in the list. There were 7 swMath ids (41810, 44115, 44246, 44918, 45060, 45166, 46002) that were already not in the knowledge graph. I've deleted them from the list I am uploading.
I will merge today CRAN packages for which I could unambiguously find an swMATH item (which includes having the same label, a valid swMATH id, with a Homepage pointing to CRAN or mentioning 'R Package' in the description). So there will be no risk of merging them with invalid swMATH items. This way it doesn't make a difference whether we delete invalid software items before or after the CRAN merge.
@eloiferrer thank you. This is now running.
I did some replacements
and now I am deleting with the DeleteBatch script from the terminal.
done.
Some software has invalid IDs. This means the software was removed by an editorial decision from swMATH. We should also delete the software entries from the portal. (CC, FYI @Daniel-Mietchen @eloiferrer )