MaRDI4NFDI / portal-compose

docker-composer repo for mardi
https://portal.mardi4nfdi.de
GNU General Public License v3.0
3 stars 1 forks source link

Delete software with invalid ids. #409

Closed physikerwelt closed 8 months ago

physikerwelt commented 8 months ago

Some software has invalid IDs. This means the software was removed by an editorial decision from swMATH. We should also delete the software entries from the portal. (CC, FYI @Daniel-Mietchen @eloiferrer )

Daniel-Mietchen commented 8 months ago

We should probably check the validity of IDs in a systematic fashion, not just for swMATH. Whether that should always lead to deletion is not clear to me at this point.

eloiferrer commented 8 months ago

This is a first non-exhaustive list of software items that currently have an invalid swMATH id. I've encountered them when working on disambiguating only CRAN packages (that is why I did not check the entire list of items).

I can try to check this for the entire list of swMATH items but this does not seem scalable if it has to be periodically repeated. Does swMATH publish some sort of deletion list?

physikerwelt commented 8 months ago

invalid_software.csv I am afraid it was never published. Do you know how to find the related QIds and delete the corresponding entries?

physikerwelt commented 8 months ago

We should probably check the validity of IDs in a systematic fashion, not just for swMATH. Whether that should always lead to deletion is not clear to me at this point.

Yes, all external links should be verified. For links to git repositories, we try to deposit them in software heritage. For others, we could at least check the HTTP return status code.

physikerwelt commented 8 months ago

However, the focus of this task is to delete invalid entries before they are merged with valid cran packages. If we perform the tasks in the opposite order, we risk to delete valid cran entries.

eloiferrer commented 8 months ago

invalid_software.csv Here the QIDs for the invalid software in the list. There were 7 swMath ids (41810, 44115, 44246, 44918, 45060, 45166, 46002) that were already not in the knowledge graph. I've deleted them from the list I am uploading.

eloiferrer commented 8 months ago

I will merge today CRAN packages for which I could unambiguously find an swMATH item (which includes having the same label, a valid swMATH id, with a Homepage pointing to CRAN or mentioning 'R Package' in the description). So there will be no risk of merging them with invalid swMATH items. This way it doesn't make a difference whether we delete invalid software items before or after the CRAN merge.

physikerwelt commented 8 months ago

@eloiferrer thank you. This is now running.

I did some replacements

Screenshot 2023-10-25 at 12 31 38

and now I am deleting with the DeleteBatch script from the terminal.

physikerwelt commented 8 months ago

done.