ana-kuznetsova / Popular-Science-Texts-Compling-research

An M.A. educational project on computational linguistics.
4 stars 3 forks source link

Review N+1 & ProScience #5

Closed ana-kuznetsova closed 6 years ago

ana-kuznetsova commented 6 years ago

1) Review and analyse https://nplus1.ru/ and according to the checklist:

First-person speech; Set of rubrics & headings; Layout of expert's opinion in html code (if there is one).

2) Contact N+1 and ask for data.

ana-kuznetsova commented 6 years ago

Nplus1.ru is a great source of the latest scientific news written by scientists from different areas. News are based on the articles from famous academic journals such as Nature or Sience and many others. All the materials are split into rubrics (about 35). There are articles of the following types:

There are expert interviews, where the direct speech is used. The main problem here is that all the interviews are kept in 'materials' so we can't separate long-reads from the interviews (although editors are likely to provide us with the access to different types of materials separately). (e.g. https://nplus1.ru/material/2016/01/20/pitulko).

Expert's direct speech and interviewer's speech are marked in a special way:

<b>Владимир Питулько:</b> <b><i></i>«N+1»:<i> </i></b> <b>В.П.:</b> Author's name is mentioned in the head of the article and also in the end of the body section. It is marked by <meta name="mediator_author" content="Кристина Уласович"> and italics, when in the end of the new. The name of the article is embedded in head's structure and marked by <h1> at the beginning and \h1 at the end of the heading.

It seems we won't have any difficulties with downloading, the layout is rather clear and it will be easy to extract direct speech with regular expressions. P.S. NO API!

ana-kuznetsova commented 6 years ago

ProScience is a rubric of a larger source Polit.ru. ProScience contains international scientific news split by date. All the news are split by tags, which can be rather general (e.g. by such tag as Latin America we can find news about viruses, plants, history, archaeological discoveries, politics, ect.).

Structure: