JabRef / jabref

Graphical Java application for managing BibTeX and biblatex (.bib) databases
https://devdocs.jabref.org
MIT License
3.63k stars 2.59k forks source link

jabref-meta storage in bib file should be improved (by switching to embedded JSON) #10371

Open koppor opened 1 year ago

koppor commented 1 year ago

Context

While seeing that diff, I thought, something is really wrong:

B

The semicolon on position 1 indicates that multiple meta data items are can be written into @Comment. This was clear to me today (and not in 2016 https://github.com/JabRef/jabref/issues/960). This would be great as this would minimize the number of @Comment entries. However, the saveActions also use ; as delimiter (position 2).

The "feature" of non-merging the meta fields is long time present. See e.g., an old issue report https://github.com/JabRef/jabref/issues/250.

Thus, a straight-forward merge is most probably not possible.

Code hint: Separation according to ; is done at org.jabref.logic.importer.util.MetaDataParser#getNextUnit


Call for new metadata storage

Single JSON in @Comment field

Example:

@Comment{jabref-meta-0.1.0
{
  "saveActions" :
  {
    "state": true,
    "date": ["normalize_date", "action2"],
    "pages" : ["normalize_page_numbers"],
    "month" : ["normalize_month"]
  }
}
}

Content:

{
  "saveActions" :
  {
     "state": true,
    "date": ["normalize_date", "action2"],
    "pages" : ["normalize_page_numbers"],
    "month" : ["normalize_month"]
  }
}

Decision outcome: Use "Single JSON in @Comment field"


Migration path:


After this is implemented, we can work on https://github.com/JabRef/jabref/issues/8701


ADR

Single JSON in @Comment field

Multiple JSON

Each preference could have a separate JSON nesting.

BibTeX

Example (From https://github.com/koppor/jabref/issues/232)

old:

@Comment{jabref-meta: saveActions:enabled;
date[normalize_date]
pages[normalize_page_numbers]
month[normalize_month]
;}

new:

@JabRef{saveActions,
  state = {enabled},
  date = {normalize_date, action2}
  pages = {normalize_page_numbers}
  month = {normalize_month}
}

@Comment and then nested

JabRef v5.9 (and before) used that format.

JSON at the end of the file

New entries always start with @. Anything outside the “argument” of a “command” starting with an @ is considered as a comment. This gives an easy way to comment a given entry: just remove the initial @. As usual when a language allows comments, don’t hesitate to use them so that you have a clean, ordered, and easy-to-maintain database. Conversely, anything starting with an @ is considered as being a new entry

@Article{demo,
   note={just an example article to illustrate the **previous** entry}
}

// jabref-meta-0.1.0
{
  "saveActions" :  {
   "state": true,
   "date": ["normalize_date", "action2"],
   "pages" : ["normalize_page_numbers"],
   "month" : ["normalize_month"]
  }
}
Siedlerchr commented 1 year ago

BibDesk on mac stores its groups into apple plist xml format:

grafik

koppor commented 1 year ago

Follow-up issues:

leaf-soba commented 2 months ago

Sorry I'm new here and I want to work on this issue, I try to break this issue into some small steps, please check if I understand this issue right.

  1. write a unit test input is the Example in Single JSON in @comment field.
    • I don't know the expected output exactly in unit test now, but I'll try to figure it out later.
      @Comment{jabref-meta-0.1.0
      {
      "saveActions" :
      {
      "state": true,
      "date": ["normalize_date", "action2"],
      "pages" : ["normalize_page_numbers"],
      "month" : ["normalize_month"]
      }
      }
      }
  2. Update MetaDataParser#getNextUnit to handle the new JSON format in unit test case
  3. Write logic code to parse, read and write new JSON format.
    • I didn't find the proper place to put these logic code, maybe I should put them in MetaDataSerializer, MetaDataParser?
    • And I didn't find the old code to read @Comment in this step now, maybe in BibtexDatabaseWriter?
  4. Add more corner case in unit test about this update.
koppor commented 1 week ago
1. write a unit test input is the Example in `Single JSON in @comment field`.

Yes

   * I don't know the expected output exactly in unit test now, but I'll try to figure it out later.

The JSON content itself. Maybe the GSon library is your friend. I made good experiences in the http server part with it.

2. Update `MetaDataParser#getNextUnit` to handle the new JSON format in unit test case

The place is ´org.jabref.logic.importer.fileformat.BibtexParser#parseJabRefComment`.

3. Write logic code to parse, read and write new JSON format.

The hole MetaDataParser can be "deleted" - and a new loading from JSON. I think, it is JSON -> DTO -> metadata. Maybe also directly from JSON to MetaData. -- "deleted" is not quite true, because JabRef should be able to read "old" files - and on version 7, the old metadata is not writtin any more. In version 6, both formats are read and written; with the new format taking predecdence)

   * I didn't find the proper place to put these logic code,  maybe I should put them in `MetaDataSerializer`, `MetaDataParser`?
   * And I didn't find the old code to read `@Comment` in this step now, maybe in `BibtexDatabaseWriter`?

See above.

There will be many unit tests for that.

leaf-soba commented 1 week ago

OK, it is clear now, please assign to me.

koppor commented 1 week ago

/assign @leaf-soba

github-actions[bot] commented 1 week ago

👋 Hey @leaf-soba, thank you for your interest in this issue! 🎉

We're excited to have you on board. Start by exploring our Contributing guidelines, and don't forget to check out our workspace setup guidelines to get started smoothly.

In case you encounter failing tests during development, please check our developer FAQs!

Having any questions or issues? Feel free to ask here on GitHub. Need help setting up your local workspace? Join the conversation on JabRef's Gitter chat. And don't hesitate to open a (draft) pull request early on to show the direction it is heading towards. This way, you will receive valuable feedback.

Happy coding! 🚀

⏳ Please note, you will be automatically unassigned if the issue isn't closed within 30 days (by 29 November 2024). A maintainer can also add the "📌 Pinned"" label to prevent automatic unassignment.