JabRef / jabref

Graphical Java application for managing BibTeX and biblatex (.bib) databases
https://devdocs.jabref.org
MIT License
3.61k stars 2.57k forks source link

Make file annotations searchable #4654

Closed doctor-ian closed 6 months ago

doctor-ian commented 5 years ago

JabRef can correctly add File annotations from the linked PDF to the corresponding tab.

However, the File annotations tab is not searchable, i.e. if I highlight a specific word in the PDF or add a word as a comment, neither the highlighted nor the commented word are identified by the global search function. I think it would be nice to make this feature available in the options, e.g. under "General" as "Include file annotations in search".

agatawitkowska commented 5 years ago

This is exactly the feature I need:)! Please include that in your next releases!

reox commented 4 years ago

I would love such a feature! Also a nice addition would be to search in all attached files.

For now, I use pdfgrep if I need to search something. You can make use of xargs to spawn multiple instances to search faster (if your hardware allows it):

#!/bin/bash

if [ -n "$(command -v lscpu)" ]; then
    CPUS=$(lscpu --parse=cpu | grep -c -v -e '^#')
else
    CPUS=4
fi

find files -iname \*.pdf -type f -print0 | xargs -0 -P $CPUS -L 1 pdfgrep -H -n "$@"

Unfortunately, pdfgrep seems not be capable of grepping in annotations only.

sankayop commented 3 years ago

I would also love this feature :+1: for JabRef developers !

achim-guldner commented 3 years ago

This would be great.

Soldrakon commented 3 years ago

Please do this

koppor commented 3 years ago

Search in PDFs will be implemented in #2838

calixtus commented 3 years ago

Hi @doctor-ian , our GSoC student @btut put a lot of effort into implementing this feature in #2838. I would like to ask you to try out the latest main branch build on builds.jabref.org if the feature is implemented as you asked.

maphouse commented 1 year ago

@koppor @btut has there been any progress on this? Full-text search is great but annotation search would be even greater!

koppor commented 1 year ago

@maphouse Thank you for asking.

Maybe you saw that GSoC was completed in 2021.

I tried today's JabRef version with the PDFs listed at https://github.com/JabRef/jabref/tree/main/src/test/resources/pdfs.

minimal-highlight-with-note.pdf renders as follows:

image

I don't find it in the library:

image

After enabling the full text search

image

Two entries are found:

image

With a listing at "Search results":

image

The page number seems to be off-by-one. Not sure.

When you are checking the feature with the newest jabref (best use the development build at https://builds.jabref.org/main/), could you check about the offset?

koppor commented 1 year ago

Note that in v5.9 we turned off the default to search in PDFs (https://github.com/JabRef/jabref/pull/9527). Long discussion at https://github.com/JabRef/jabref/issues/9491.

Maybe, this is confusing for you as end user?

maphouse commented 1 year ago

Note that in v5.9 we turned off the default to search in PDFs (#9527). Long discussion at #9491.

Maybe, this is confusing for you as end user?

Hi @koppor , thanks!

I'm using:

JabRef 5.9--2023-01-08--76253f1a7
Windows 10 10.0 amd64 
Java 19.0.1 
JavaFX 19+11

input test comment image

search results didn't return anything until I activated case sensitivity image

Now, whenever I just activate fulltext search, I get the intended result.

Works great after all. Good to know where to look in the preview tab though ("Search results"). The page numbers so far are accurate. Thanks for the help!

Timpology commented 1 year ago

Hi @koppor,

I've been trying to search PDF annotations with the JabRef version listed below, and am having mixed results. While it returns annotation results from some pdfs, it does not from may others.

JabRef 5.9--2023-01-08--76253f1a7 Windows 10 10.0 amd64 Java 19.0.1 JavaFX 19+11

I've only tried a few searches, and have copied the text directly from the annotations that show up in the "File Annotations" tab, but some documents' annotations are not showing search results.

I have tried combinations of search options including: regular expressions, case sensitive and Fulltext search. I have also used individual words, combinations of words and phases, and containing the search in double quotations (""). I have not found a combination that will return annotation results from some pdfs.

I've attached one of the pdfs with annotations that was unsuccessful. Note that annotation results are showing up for some pdfs, just not many. The ones that don't show annotation results, do still show regular Fulltext search results from the PDF.

naca-report-1135 Equations, Tables, and Charts for Compressible Flow.pdf

ThiloteE commented 1 year ago

True @Timpology. I can reproduce. The pdf you provide has annotations, but I fail to find them when using normal, regex or fulltext search.

LoayGhreeb commented 7 months ago

True @Timpology. I can reproduce. The pdf you provide has annotations, but I fail to find them when using normal, regex or fulltext search.

@ThiloteE, I can't reproduce with the latest development build using the same PDF.

koppor commented 7 months ago
  1. I attached the file naca-report-1135 Equations, Tables, and Charts for Compressible Flow.pdf to an entry (in Chocolate.bib)
  2. I enabled PDF fulltext search
  3. I searched for Shock Angle Vs Deflection Angle Chart
  4. Result:

image

Thus, the annotation is perfectly found.

@Timpology Is it also the case that "Shock Angle Vs Deflection Angle Chart" was not found on your machine or was it another string? -- TBH, I need one concrete example to guide contributors.


Side issue: JabRef does not support "Watermark":

2024-03-25 00:15:44 [JavaFX Application Thread] org.jabref.model.pdf.FileAnnotationType.parse()
INFO: FileAnnotationType Watermark is not supported and was converted into 'Unknown'!
Timpology commented 6 months ago

@koppor - There might be an issue with my installation. I tried the exact search you laid out above "Shock Angle Vs Deflection Angle Chart" and did not get any search results. I then tried it on my laptop running v5.9 on Windows 11, and it worked there. I updated my JabRef version on my desktop (the computer that where the search was not working) to v5.12 to see if it was my version, but I could not get JabRef to start at all. After trying a few fixes I found online, I was not able to get JabRef to start, so I uninstalled it and reinstalled v5.10. After reinstalling I tried the search feature again, and it now works.

I'm not sure exactly what factors lead to it working again, other than uninstall / reinstall. I'm also thinking I may have a localized issue since I cannot get v5.12 to run at all. That will be troubleshooting for another day though.

koppor commented 6 months ago

I'm not sure exactly what factors lead to it working again, other than uninstall / reinstall..

Maybe that lead to a reindex...

I'm also thinking I may have a localized issue since I cannot get v5.12 to run at all. That will be troubleshooting for another day though.

You are on macOS? https://github.com/JabRef/jabref/issues/11082 or https://github.com/JabRef/jabref/issues/11035 or https://github.com/JabRef/jabref/issues/10716

You are on Linux? https://github.com/JabRef/jabref/issues/10673

@Timpology

koppor commented 6 months ago

@doctor-ian I think, the issue is solved. Feel free to comment, if it is not.