JabRef / jabref

Graphical Java application for managing BibTeX and biblatex (.bib) databases
https://devdocs.jabref.org
MIT License
3.63k stars 2.59k forks source link

Saving bug after import - braces don't match #8730

Open falschgeldkind opened 2 years ago

falschgeldkind commented 2 years ago

JabRef version

Other (please describe below)

Operating system

GNU / Linux

Details on version and operating system

No response

Checked with the latest development build

Steps to reproduce the behaviour

Jabref Version: latest (main) dev build (today) but this bug exists in previous versions as well.

Saving throws an exception after importing PDF Files. Probably has to do with OCR reading some special characters like { } \n etc. Trying to save them then leads to a Situation where Jabref can't decide where the entry ends or something like that

To reproduce:

  1. Import some PDF
  2. click on save
  3. get an exception

Exception in the appendix

Appendix

...

``` org.jabref.logic.exporter.SaveException: Problems saving: java.io.IOException: Error in field 'AUTHOR of entry SINGLEFINandBUFFETINGALLEVIATION0656': Braces don't match. Field value: FOR SINGLE{FIN and BUFFETING ALLEVIATION at org.jabref@5.7.4/org.jabref.gui.exporter.SaveDatabaseAction.saveDatabase(Unknown Source) at org.jabref@5.7.4/org.jabref.gui.exporter.SaveDatabaseAction.save(Unknown Source) at org.jabref@5.7.4/org.jabref.gui.exporter.SaveDatabaseAction.save(Unknown Source) at org.jabref@5.7.4/org.jabref.gui.exporter.SaveDatabaseAction.save(Unknown Source) at org.jabref@5.7.4/org.jabref.gui.exporter.SaveAction.execute(Unknown Source) at org.jabref@5.7.4/org.jabref.gui.actions.JabRefAction.lambda$new$3(Unknown Source) at org.jabref.merged.module@5.7.4/org.controlsfx.control.action.Action.handle(Unknown Source) at org.jabref.merged.module@5.7.4/org.controlsfx.control.action.Action.handle(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.CompositeEventHandler.dispatchBubblingEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.EventHandlerManager.dispatchBubblingEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.EventHandlerManager.dispatchBubblingEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.BasicEventDispatcher.dispatchEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.EventDispatchChainImpl.dispatchEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.EventUtil.fireEventImpl(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.EventUtil.fireEvent(Unknown Source) at org.jabref.merged.module@5.7.4/javafx.event.Event.fireEvent(Unknown Source) at org.jabref.merged.module@5.7.4/javafx.scene.control.MenuItem.fire(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.scene.control.ControlAcceleratorSupport.lambda$doAcceleratorInstall$2(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.scene.KeyboardShortcutsHandler.processAccelerators(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.scene.KeyboardShortcutsHandler.dispatchBubblingEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.CompositeEventDispatcher.dispatchBubblingEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.BasicEventDispatcher.dispatchEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.EventDispatchChainImpl.dispatchEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.BasicEventDispatcher.dispatchEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.EventDispatchChainImpl.dispatchEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.EventUtil.fireEventImpl(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.event.EventUtil.fireEvent(Unknown Source) at org.jabref.merged.module@5.7.4/javafx.event.Event.fireEvent(Unknown Source) at org.jabref.merged.module@5.7.4/javafx.scene.Scene$KeyHandler.process(Unknown Source) at org.jabref.merged.module@5.7.4/javafx.scene.Scene.processKeyEvent(Unknown Source) at org.jabref.merged.module@5.7.4/javafx.scene.Scene$ScenePeerListener.keyEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.tk.quantum.GlassViewEventHandler$KeyEventNotification.run(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.tk.quantum.GlassViewEventHandler$KeyEventNotification.run(Unknown Source) at java.base/java.security.AccessController.doPrivileged(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.tk.quantum.GlassViewEventHandler.lambda$handleKeyEvent$1(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.tk.quantum.QuantumToolkit.runWithoutRenderLock(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.javafx.tk.quantum.GlassViewEventHandler.handleKeyEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.glass.ui.View.handleKeyEvent(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.glass.ui.View.notifyKey(Unknown Source) at org.jabref.merged.module@5.7.4/com.sun.glass.ui.gtk.GtkApplication._runLoop(Native Method) at org.jabref.merged.module@5.7.4/com.sun.glass.ui.gtk.GtkApplication.lambda$runLoop$11(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Error in field 'AUTHOR of entry SINGLEFINandBUFFETINGALLEVIATION0656': Braces don't match. Field value: FOR SINGLE{FIN and BUFFETING ALLEVIATION at org.jabref@5.7.4/org.jabref.logic.bibtex.BibEntryWriter.writeField(Unknown Source) at org.jabref@5.7.4/org.jabref.logic.bibtex.BibEntryWriter.writeRequiredFieldsFirstRemainingFieldsSecond(Unknown Source) at org.jabref@5.7.4/org.jabref.logic.bibtex.BibEntryWriter.write(Unknown Source) at org.jabref@5.7.4/org.jabref.logic.exporter.BibtexDatabaseWriter.writeEntry(Unknown Source) at org.jabref@5.7.4/org.jabref.logic.exporter.BibDatabaseWriter.savePartOfDatabase(Unknown Source) at org.jabref@5.7.4/org.jabref.logic.exporter.BibDatabaseWriter.saveDatabase(Unknown Source) ... 42 more Caused by: org.jabref.logic.bibtex.InvalidFieldValueException: Braces don't match. Field value: FOR SINGLE{FIN and BUFFETING ALLEVIATION at org.jabref@5.7.4/org.jabref.logic.bibtex.FieldWriter.checkBraces(Unknown Source) at org.jabref@5.7.4/org.jabref.logic.bibtex.FieldWriter.formatAndResolveStrings(Unknown Source) at org.jabref@5.7.4/org.jabref.logic.bibtex.FieldWriter.write(Unknown Source) ... 48 more ```
ThiloteE commented 2 years ago

Would it be possible if you could provide the pdf that you imported? (As long as there is no copy-right) you also can send the file to jabref maintainers in a private e-mail (web@jabref.org).

Siedlerchr commented 2 years ago

Well, the error mesage says the problem is a brace: org.jabref.logic.exporter.SaveException: Problems saving: java.io.IOException: Error in field 'AUTHOR of entry SINGLEFINandBUFFETINGALLEVIATION0656': Braces don't match. Field value: FOR SINGLE{FIN and BUFFETING ALLEVIATION

falschgeldkind commented 2 years ago

@ThiloteE Sorry due to copyright issues I cannot share the PDF file

But @Siedlerchr is correct.

Maybe you could just check the new entries after the OCR step for any braces and discard them? Maybe even leave that field empty (because it probably is fucked up OCR and the value does not make any sense anyway)

I also have made the observation that newline characters are sometimes included in the field values and lead to problems

ThiloteE commented 2 years ago

Maybe you could share the metadata that gets imported into JabRef before saving then. There should not be any copyright on that, right?

I assume there is a brace too many in one of the fields, therefore JabRef detects this and gives the error. If you have many entries and routinely stumble upon odd number of braces in your PDFs, then you should check your workflow and the programs you use to create them. Instead of trying to repair wrong syntax, please make sure to also prevent wrong syntax being created in the first place.

{ and } are JabRef special characters and denote the beginning and end of a field. Imagine JabRef automatically removing an odd number of braces: text that comes after or before the brace that was removed will still be there, but not within braces, therefore will be out of place.

I just did a short test what JabRef will do, when there are no braces. The below is an example for it, but keep in mind that this is WRONG syntax and should be heavily avoided:

@Book{aii,
  title  = {Africa in International Politics},
  groups = test,
}

after clicking on another entry in the main table, will somehow turn to:

@Book{aii,
  title  = {Africa in International Politics},
  groups = {#test#},
}

So I guess having a mechanism that automatically removes odd number of braces upon saving would be ok, because there is a fall-back mechanism in place that tries to recreate fields?

Removing the odd number of braces would need to be done at a certain point in time, at which it is clear that the user is not working on the bibliographic entries anymore. Removing braces instantly, when users would want to ADD braces and have not finished adding the right amount of braces would be a detriment.

Btw.: JabRef's integrity check does not find {#test#},, which should probably be the case.

falschgeldkind commented 2 years ago

I do not create any PDFs i'm just importing unlinked files (that already exist).

I think the OCR sometimes confuses normal braces with curly braces

Yes the Problem is that there sometimes are (curly) braces within the fields. Wouldn't it be enough to just delete all braces WITHIN fields or swap them out for normal braces?

ThiloteE commented 2 years ago

I still don't know what OCR means.

falschgeldkind commented 2 years ago

Optical character recognition. Isn't that what the PDF import does to get the Author etc. if its not contained in the PDF metadata?

ThiloteE commented 2 years ago

We don't know what is imported to JabRef and what it parses and what method it uses to import because you have failed to provide this data to us. Sorry :/ Troubleshooting this is not easy.

JabRef by default checks for data in the following order, when Grobid is enabled (but you also can disable Grobid):

1. Look for bibtex entry on first page of pdf
2. Look for embedded bib file
3. Grobid
4. XMP metadata
5. Attempt to find metadata on first page (not in bibtex format).

Source: https://discourse.jabref.org/t/extract-information-from-pdf-import/2899/6

If you say it parses optically, this then points to a Grobid issue: https://github.com/kermitt2/grobid/issues

ThiloteE commented 2 years ago

You can check metadata attached to the pdf via exiftool .

The following command will allow you to extract all available metadata from PDFs:

exiftool -ee3 -U -G3:1 -api requestall=3 -api largefilesupport FILE

Source

falschgeldkind commented 2 years ago

the next time this problem turns up I'll make a screenshot or something like that and investigate with exiftool

falschgeldkind commented 2 years ago

Allright. Got another one: There seems to be no metadata in this pdf:

exiftool -ee3 -U -G3:1 -api requestall=3 -api largefilesupport $DOCPATH/Druckschriften/DGLR/JT99_102.PDF
bley@DellOptiPlex-7010:~$ 

Bildschirmfoto von 2022-04-29 13-40-21

ThiloteE commented 2 years ago

mhm I cannot believe the pdf is holding absolutely zero metadata. Try this:

1. Start the commandline on the folder holding the PDF(s)
2. exiftool.exe has to be in this folder
3. Use the following command:

exiftool -ee3 -U -G3:1 -api requestall=3 -api largefilesupport FILE
ThiloteE commented 2 years ago

Also, please show the bibtex source tab. sometimes what is shown in other tabs diverges from what is shown in bibtex source.

But we already can see that there indeed is a curly brace opening, but not closing, so the immediate workaround would be to remove that curly brace or add another curly brace to close the argument and the error should be gone.

falschgeldkind commented 2 years ago

I use linux. So no exe :D

how do I get the bibtex source tab?

ThiloteE commented 2 years ago

Ah right. My bad.

The {} biblatex source tab is one of the tabs of the entry editor in JabRef. image

falschgeldkind commented 2 years ago

another one with a similar error:

bley@DellOptiPlex-7010:/home/pfisun8n/allgem0/08_Literatur/Dokumente/Druckschriften/EUCASS$ exiftool -ee3 -U -G3:1 -api requestall=3 -api largefilesupport a169.pdf
bley@DellOptiPlex-7010:/home/pfisun8n/allgem0/08_Literatur/Dokumente/Druckschriften/EUCASS$ 

the source tab says this:

Error in field 'AUTHOR of entry ГпуезИаНоп1310': Braces don't match. Field value: Ап ГпуезИ^аНоп and оГ Бупапнс and о( Огйегей and 81гис1иге ш Ехсйей and Зе1 изш РГУ and - Ъазес and рЬазе-  and ауега§1П§ {есЬшцие.

Correct the entry, and reopen editor to display/edit source.
ThiloteE commented 2 years ago

The immediate workaround would be to remove that curly brace or add another curly brace to close the argument, and the error should be gone.

Another workaround would be to try to change your workflow and have bibliographic data in JabRef and then to use Quality > automatically set file links (F7). The advantage of this method is that you can add the correct bibliographic data manually or download or import if from somewhere else and then to just attach the pdf to the correct data. In other words: You will have less work dealing with correcting the wrong bibliographic data that was parsed from the pdf, because the parsing is far from perfect, as you can see.

If you don't have bibliographic data at hand you also can disable grobid and import XMP metadata via file > import > ... first. Choose the following:

image

Afterwards you can then automatically set file links (F7). I personally use a regex to find files in my system, but you also can use the citationkey or name your files after the DOI. There are nice preferences:

image

Of course, if you do not have bibliographic data at hand at all and if there is no metadata attached to the pdf, the second workaround may not work well for you.

falschgeldkind commented 2 years ago

Thank you.

I do not have the bibliographic data unfortunately