fmfi-svt / steel-wok

Exactly what it sounds like
0 stars 0 forks source link

`article_no` field is never filled #10

Open JMatej opened 2 years ago

JMatej commented 2 years ago

Look at the data:

{
    'DOI': '10.1007/s11262-021-01866-5',
    'article_no': '',
    'authors': [...],
    'citing_summary': [...],
    'issue': '6',
    'journal': 'VIRUS GENES',
    'page': '556-560',
    'pubdate': 'DEC 2021',
    'publisher': 'SPRINGER, VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, '
                'NETHERLANDS',
    'times_cited': '',
    'title': 'A SARS-CoV-2 mutant from B.1.258 lineage with increment H69/ '
            'increment V70 deletion in the Spike protein circulating in Central '
            'Europe in the fall 2020',
    'volume': '57',
    'year': '2021'
}

The article_no field is always empty. I've looked at the website with the Inspect Element feature, and I haven't found any HTML tag as specified here: https://github.com/fmfi-svt/steel-wok/blob/c436eeff6373463f6118f034ddc4e006fd48de7e/extract-wos-article.py#L24

@mrshu I'm not sure what the value is supposed to be, one of these from the screenshot below?

Screenshot 2022-05-11 at 10 21 00
JMatej commented 2 years ago

@mrshu it looks like sometimes, there is the Article Number field and sometimes there is the Page field. I'm not sure how does it work but I'll check it.

Screenshot 2022-05-11 at 11 12 54
mrshu commented 2 years ago

Thanks for pointing it out @JMatej -- I believe it is indeed the case. I also feel like account_no should stay empty in these cases (and we potentially need to have a page field as well).