cofacts / rumors-api

GraphQL API server for clients like rumors-site and rumors-line-bot
https://api.cofacts.tw
MIT License
112 stars 27 forks source link

Hyperlink detection mechanism in article & reply creation #345

Open MrOrz opened 2 months ago

MrOrz commented 2 months ago

From: https://g0v.hackmd.io/IrRBabPXQBOVQTjSNlIdwg?both=#%E6%9C%AA%E7%AB%9F%E9%A0%85%E7%9B%AE

Previously in cofacts/rumors-site#569, the rumors-site has changed its URL extraction logic. This caused an issue that links may be showed as an hyperlink in website, but no review is available, due to rumors-api not recognizing these hyperlinks at all, thus not scraping them to populate the hyperlinks field in articles or replies.

Example: https://dev.cofacts.tw/article/GduxspEBoURTSGJKAAMu

We should figure out

  1. Should we also use linkifyjs in rumors-api to extract hyperlinks and do Wayback Machine archiving?
  2. Is it meaningful to extract URLs out of images and videos? Is it common for images to contain URLs? 3.Transcripts may change as user updates them, shall we extract URLs (both hyperlinks and Wayback Machine archiving) every time the transcript is changed?