DIYgod / RSSHub

🧡 Everything is RSSible
https://docs.rsshub.app
MIT License
33.15k stars 7.38k forks source link

WSJ English routes null properties #13117

Open darndankarlsson opened 1 year ago

darndankarlsson commented 1 year ago

Routes

/wsj/:lang/:category?

Full routes

/wsj/en-us/us
/wsj/en-us/world
/wsj/en-us/politics
/wsj/en-us/economy
/wsj/en-us/business
/wsj/en-us/technology
/wsj/en-us/markets
/wsj/en-us/books-arts
/wsj/en-us/realestate
/wsj/en-us/life-work
/wsj/en-us/style-entertainment
/wsj/en-us/sports

Related documentation

https://docs.rsshub.app/routes/traditional-media#the-wall-street-journal-(wsj)-hua-er-jie-ri-bao

What is expected?

Return XML file for RSS feed

What is actually happening?

Error message: null properties

Deployment information

RSSHub demo (https://rsshub.app)

Deployment information (for self-hosted)

No response

Additional info

Route requested: /en-us/us

Error message: Cannot read properties of null (reading '0')

Helpful Information to provide when opening issue:
Path: /en-us/us
Node version: v18.17.1
Git Hash: eebd99c

This is not a duplicated issue

github-actions[bot] commented 1 year ago
Searching for maintainers:

To maintainers: if you are not willing to be disturbed, list your username in scripts/workflow/test-issue/call-maintainer.js. In this way, your username will be wrapped in an inline code block when tagged so you will not be notified.

如果所有路由都无法匹配,issue 将会被自动关闭。如果 issue 和路由无关,请使用 NOROUTE 关键词,或者留下评论。我们会重新审核。 If all routes can not be found, the issue will be closed automatically. Please use NOROUTE for a route-irrelevant issue or leave a comment if it is a mistake.

hmpthz commented 1 year ago

There are two very different json formats on WSJ english and chinese sites. In https://github.com/DIYgod/RSSHub/blob/master/lib/v2/wsj/news.js it follows the json format on chinese sites, which stores the data at window.__STATE__. This json object has all articles as keys so an easy Object.entries is enough to iterate them. However, on most english sites (wsj/opinion also uses window.__STATE__ data format IDK why), the data is stored in a script tag that looks like this <script id="__NEXT_DATA__" type="application/json"> Also, the articles are stored in nested objects:

type Data = {
  "props": {
    "pageProps": {
      "articlesByL2": {
        [localeCode: number]: Article[]
      },
      "latestArticles": Article[]
    }
  }
}
hmpthz commented 1 year ago

I'm wondering if the original author of this route @NavePnow could take some time on this issue, your help would be very much appreciated.

EthanWng97 commented 2 months ago

@hmpthz NavePnow is renamed to EthanWng97 will take a look this weekend.