MicheleCotrufo / pdf2bib

A python library/command-line tool to quickly and automatically generate BibTeX data starting from the pdf file of a scientific publication.
58 stars 7 forks source link

Trailing colon for some PDF files #11

Closed user202729 closed 2 months ago

user202729 commented 5 months ago

e.g. https://arxiv.org/pdf/2310.00367v2 generates:

@article{belouadi2023automatikz:,
        title = {AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ},
        published = {2023-09-30T13:15:49Z},
        ejournal = {arXiv},
        url = {http://arxiv.org/abs/2310.00367v2},
        year = {2023},
        month = {09},
        day = {30},
        author = {Jonas Belouadi and Anne Lauscher and Steffen Eger}
}

Note that the first line belouadi2023automatikz: has a trailing colon. I don't think this is desirable.

user202729 commented 5 months ago

A worse example is https://dl.acm.org/doi/pdf/10.1145/3313831.3376253 which produces

@article{ma'ayan2020how,
        title = {How Domain Experts Create Conceptual Diagrams and Implications for Tool Design},
        publisher = {ACM},
        url = {http://dx.doi.org/10.1145/3313831.3376253},
        doi = {10.1145/3313831.3376253},
        journal = {Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems},
        year = {2020},
        month = {4},
        author = {Dor Ma'ayan and Wode Ni and Katherine Ye and Chinmay Kulkarni and Joshua Sunshine}
}

the first line has ' which makes it invalid BibTeX.

user202729 commented 4 months ago

Or http://dx.doi.org/10.1145/3180155.3180165 which makes the identifier include a newline.

MicheleCotrufo commented 4 months ago

Thanks for highlighting this, I will patch it up soon. Feel free to send a pull request if you have fixed the code yourself

MicheleCotrufo commented 2 months ago

This should be fixed by version v1.2 (https://github.com/MicheleCotrufo/pdf2bib/releases/tag/v1.2) feel free to re-open it if you still have this issue