GetRD / academic-file-converter

📚 Import Bibtex publications and Jupyter Notebook blog posts into your Markdown website or book. 将Bibtex转换为Markdown网站
https://docs.hugoblox.com/reference/content-types/#automatically-import-publications-from-bibtex
MIT License
363 stars 102 forks source link

The LaTeX code inside title would become invalid after `clean_bibtex_str` #101

Closed Ionizing closed 3 years ago

Ionizing commented 3 years ago

Thanks for this awesome tool to help me build the academic page and it finally comes to the last part. But I found there is something wrong with the output

The {} and \ inside title

If there were short LaTeX code in the title like:

@article{PhysRevLett.105.136805,
  title = {Atomically Thin ${\mathrm{MoS}}_{2}$: A New Direct-Gap Semiconductor},
  author = {Mak, Kin Fai and Lee, Changgu and Hone, James and Shan, Jie and Heinz, Tony F.},
  journal = {Phys. Rev. Lett.},
  volume = {105},
  issue = {13},
  pages = {136805},
  numpages = {4},
  year = {2010},
  month = {Sep},
  publisher = {American Physical Society },
  doi = {10.1103/PhysRevLett.105.136805},
  url = {https://link.aps.org/doi/10.1103/PhysRevLett.105.136805}
}

(This bib is exported from the APS website)

After the process by academic, the cite.bib would be like:

@article{PhysRevLett.105.136805,
 author = {Mak, Kin Fai and Lee, Changgu and Hone, James and Shan, Jie and Heinz, Tony F.},
 doi = {10.1103/PhysRevLett.105.136805},
 issue = {13},
 journal = {Phys. Rev. Lett. },
 month = {Sep},
 numpages = {4},
 pages = {136805},
 publisher = {American Physical Society},
 title = {"Atomically Thin $\mathrmMoS_2$: A New Direct-Gap Semiconductor"},
 url = {https://link.aps.org/doi/10.1103/PhysRevLett.105.136805},
 volume = {105},
 year = {2010}
}

Also the index.md would be like:

---
title: '\"Atomically Thin $mathrmMoS_2$: A New Direct-Gap Semiconductor\"'
date: '2010-09-01'
draft: true
publishDate: '2021-08-21T05:51:14.899975Z'
authors:
- Kin Fai Mak
- Changgu Lee
- James Hone
- Jie Shan
- Tony F. Heinz
publication_types:
- '2'
abstract: ''
featured: false
publication: '*Phys. Rev. Lett. *'
url_pdf: https://link.aps.org/doi/10.1103/PhysRevLett.105.136805
doi: 10.1103/PhysRevLett.105.136805
---

Someone suggested adding quotation to surround the title but ended with no help.

It seems all the {}s and \s are trimmed by https://github.com/wowchemy/hugo-academic-cli/blob/main/academic/import_bibtex.py#L87

page.fm["title"] = clean_bibtex_str(entry["title"])

where

def clean_bibtex_str(s):
    """Clean BibTeX string and escape TOML special characters"""
    s = s.replace("\\", "")
    s = s.replace('"', '\\"')
    s = s.replace("{", "").replace("}", "")
    s = s.replace("\t", " ").replace("\n", " ").replace("\r", "")
    return s

I checked the code and I guess the expected behavior is only trimming the surrounding {} for title (or even for all fields). Maybe the function clean_bibtex_str needs to be refined ...

The space at the end of journal

Pay attention to the frontmatter of index.md: if there is extra spaces at the end of journal, the publication item of frontmatter would be like publication: '*Phys. Rev. Lett. *' which leads to unresolved markdown emphasis format.

I think just an extra str.strip can resolve this issue easily.

Ionizing commented 3 years ago

Seems I need to do it myself?

github-actions[bot] commented 3 years ago

This issue is stale because it has not had any recent activity. The resources of the project maintainers are limited, and so we are asking for your help.

If this is a bug and you can still reproduce this error on the main branch, consider contributing a Pull Request with a fix.

If this is a feature request, and you feel that it is still relevant and valuable, consider contributing a Pull Request for review.

This issue will automatically close soon if no further activity occurs. Thank you for your contributions.

Ionizing commented 2 years ago

Nobody answered?