chimbori / crux

Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages.
Apache License 2.0
239 stars 43 forks source link

[NYT] Content after ad is not extracted #9

Open anhtuan23 opened 5 years ago

anhtuan23 commented 5 years ago

Hi, In NYT articles, text after the first ad is not extracted. For example: https://www.nytimes.com/2018/12/06/us/politics/huawei-meng-china-iran.html?action=click&module=Top%20Stories&pgtype=Homepage I tried to extract in Hermit and the result is the same.