NIAID-Data-Ecosystem / nde_research

This repository is for research conducted for improving the NIAID Data Ecosystem
0 stars 2 forks source link

[INVESTIGATION]: Develop methodology for spam detection #3

Open gtsueng opened 1 month ago

gtsueng commented 1 month ago

Issue Name

Develop methodology for spam detection

Issue Description

Mendeley, Harvard Dataverse, and Zenodo may occasionally get spam submissions. Zenodo has automated spam detection in place which will flag records as spam for removal, though the process can take over a month. It is not clear if Mendeley or Harvard Dataverse are using any sort of automated methods.

Explore the use of LLMs for spam/ad detection

Issue Discussion

This issue as been reported in the Technical reports for July and August 2024

Request Type

Investigation (perform aggregations, analysis, etc.)

Material URL

No response

Related WBS task

https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/54

For internal use only. Assignee, please select the status of this issue

Status Description

No response

Request status check list

gtsueng commented 1 week ago

As mentioned at the meeting dated 2024.09.24, notifying the repositories of spam could be a good opportunity for reaching out to these GREI repositories. We have drafted an email template for such an outreach attempt and a separate issue has been created for it: https://github.com/NIAID-Data-Ecosystem/niaid-feedback/issues/160