JabRef / jabref

Graphical Java application for managing BibTeX and biblatex (.bib) databases
https://devdocs.jabref.org
MIT License
3.54k stars 2.48k forks source link

Add FileMonitor for LaTeX citations #10585

Closed koppor closed 4 months ago

koppor commented 10 months ago

The LaTeX citations are NOT backed by a file update monitor.

a

Rewrite the code to use our org.jabref.gui.util.DefaultFileUpdateMonitor (or a new file update monitor) with proper shutdown at the end.

(Note: This is the technically sound fix for https://github.com/JabRef/jabref/issues/10584)


Update

A first PR was submitted at https://github.com/JabRef/jabref/pull/10937. It turned out, that a new FileMonitor is needed. It should watch a directory.

The file listeners:


Reasons: Energy saving. Very little CPU usage for an update on one file. Although your current method works, it parses ALL .tex files again, and handles the content. Large LaTeX projects have dozens of tex files, which are very long. This costs time.

Jaovitosr commented 10 months ago

Hi! Hope you're all doing well! I'm really interested in tackling this issue as a beginner. It looks like a great way to learn and dive into the project. Looking forward to contributing! Thanks!

ThiloteE commented 10 months ago

Sure, go ahead :-)

Jaovitosr commented 10 months ago

Thank you very much! :)

Jaovitosr commented 9 months ago

Hello, how are you? I hope all is well!

I'd like to sincerely apologize for the delay in addressing the issue. I encountered some unexpected setbacks, but I'm back and aim to resolve it as soon as possible. Just a few questions... Should I follow the solution approach outlined in issue #10584 and implement a refresh button? I've located the implementation of the File monitor in the project and have a grasp of how to use it, but I would like clarification on the necessity of incorporating this button.

Thank you in advance for your assistance and understanding! :)

ThiloteE commented 9 months ago

If you manage to do it without causing extensive performance degradation for large libraries, then not having a refresh button would be fair enough :-)

roxannecvl commented 7 months ago

Hey, my university software engineering group are willing to tackle this issue. Would it be possible to assign us ? Here are our git username @rachedkko @VinterSallad @Emiesce @MercuriaL01 and @roxannecvl. Thank you for your answer !!

ThiloteE commented 7 months ago

Sure, I will assign you.

rachedkko commented 6 months ago

Hi, i am also working on this issue !

VinterSallad commented 6 months ago

Hi, I am in the same group and university course!

MercuriaL01 commented 6 months ago

Hello! I am working on it too together with @rachedkko @VinterSallad @Emiesce and @roxannecvl

ThiloteE commented 6 months ago

For organisational purposes it is totally fine, if only one person is assigned, if we know you are a group and work together. It is only possible to assign somebody who has commented in the issue, but if everybody comments, it will be a lot of spam, if we have like 160 developers.

github-actions[bot] commented 6 months ago

As a general advice for newcomers: check out Contributing for a start. Also, guidelines for setting up a local workspace is worth having a look at.

Feel free to ask here at GitHub, if you have any issue related questions. If you have questions about how to setup your workspace use JabRef's Gitter chat. Try to open a (draft) pull-request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

VinterSallad commented 6 months ago

Hi, I wanted to ask about the requirement of having a proper shutdown to the listeners at the end.

When are the listeners supposed to be shut down exactly?

If it is on program shutdown, then I think it is all handled since the fileMonitor I get is from the Globals class. Or is it whenever the user navigates out of the Latex Citations tab?

koppor commented 6 months ago

When are the listeners supposed to be shut down exactly?

Try to think about the use case to answer the question.

As user, I use the citations tab wihtin a opened .bib file (called "library") and have a .aux file attached.

As user, I can switch tabs in the entry editor. If I navigate back, I want to see the recent updates --> no shutdown.

As user, I can close the library. Then I don't need updates any more, because the tab is not visible. If I re-open the tab, I want to see the most recent cites --> shutdown. On tab initialization, the aux file is parsed again.

As user, I can close JabRef. Then JabRef closes all libraries. Handled by the case "I can close the library".

If it is on program shutdown, then I think it is all handled since the fileMonitor I get is from the Globals class. Or is it whenever the user navigates out of the Latex Citations tab?

Using "Globals" is too far away. The citations check is bound to a library. I think, I answered the question above.

LoayGhreeb commented 5 months ago

I would like to work on this issue.

LoayGhreeb commented 5 months ago

@koppor, while working on this issue, implementing a new directory monitor there are a few options to use:

java.nio.file.WatchService

Tested on Windows 11 using WatchDir.java example from the Oracle tutorials Watching a Directory for Changes.

There are multiple issues:

  1. It does not detect files coming together with a new folder. (JDK issue: JDK-8162948)
  2. Deleting a sub-directory using Del (Move to trash) does not detect deleted files in the sub-directory. However, it detects deleted files when using Shift+Del (Delete parentally).
  3. Access denied when trying to delete the recursively watched directory on Windows. (JDK issue: JDK-6972833)
  4. It is implemented on MacOS by the generic PollingWatchService, which continuously re-scans the directory. (JDK issue: JDK-8293067)

    io.methvin.watcher.DirectoryWatcher

Tested on Windows 11 using mentioned example in the repo.

It resolves issues (1, 3, 4) that occur in java.nio.file.WatchService because it uses ExtendedWatchEventModifier.FILE_TREE on Windows, and native implementation based on the Carbon File System Events API for MacOS. However, issue (2) can still be reproduced.


org.apache.commons.io.monitor

There are no issues with it, but it uses a polling mechanism at fixed intervals to check for any new events. This can waste CPU cycles, especially if no change occurs. However, it can handle huge files without overflowing.


What do you suggest I should do? I haven't tested on any other device or OS to see if these issues are reproducible or not.

koppor commented 5 months ago

@LoayGhreeb Nice evaluation! I need to think in ADRs, especially MADR. What are our decision drivers?

Seeing your JDK links, I checked the source of both the Apache Commons IO and methvin's watchers. I saw at https://github.com/gmethvin/directory-watcher/blob/3218d68a845ebd803ebd98af3be4692d1b63e12c/core/src/main/java/io/methvin/watcher/DirectoryWatcher.java#L162 that in non-macOS. Re-iterating on your comment again (ExtendedWatchEventModifier.FILE_TREE), I checked the code more. I do not like that they hash the file completely (https://github.com/gmethvin/directory-watcher/blob/3218d68a845ebd803ebd98af3be4692d1b63e12c/core/src/main/java/io/methvin/watcher/DirectoryWatcher.java#L468). They could have hashed the first 2kb and then do a full hash.

I also checked FileAlterationObserver.java. I liked FileFilterUtils.suffixFileFilter(".java"));. In our case, we are interested in .tex only.

Apache Commons does a check on modification time and length (refresh). This is IMHO the better approach

Thus, Apache Commons is IMHO the better way to go.


BTW: Being able to press the dot (.) to open a VS Code in the Web for a GitHub repo is very nice for code browsing. In case you didn't know.

Siedlerchr commented 5 months ago

Our bib file watcher is based on the java watch service

LoayGhreeb commented 5 months ago

Our bib file watcher is based on the java watch service

It works because it doesn't care about the sub-directories

koppor commented 5 months ago

Our bib file watcher is based on the java watch service

It works because it doesn't care about the sub-directories

We could register the .tex files directly. Watch the latex directory. And then check for \input and \include. However, this has more efforts than using Apache Commons FileMonitor and filtering for .tex files.

LoayGhreeb commented 5 months ago

We could register the .tex files directly. Watch the latex directory. And then check for \input and \include. However, this has more efforts than using Apache Commons FileMonitor and filtering for .tex files.

Does this handle the creation or deletion of .tex files in sub-directories of the latex directory?

By the way, can the \include command include a .tex file that is in another directory? If yes, should I also watch that directory to monitor changes to that file (Maybe using the DefaultFileUpdateMonitor)?

koppor commented 5 months ago

We could register the .tex files directly. Watch the latex directory. And then check for \input and \include. However, this has more efforts than using Apache Commons FileMonitor and filtering for .tex files. Does this handle the creation or deletion of .tex files in sub-directories of the latex directory?

What I meant is that Apache Commons FileMonitor and filtering for .tex files is great (because the manual way has more efforts). Just go ahead with that. Seems to be more clean code!

By the way, can the \include command include a .tex file that is in another directory?

Yes. One could even have macro expansion on the file name. Rarely used though.

If yes, should I also watch that directory to monitor changes to that file (Maybe using the DefaultFileUpdateMonitor)?

When using Apache Commons FileMonitor , that quesion is obsolete, isn't it?

LoayGhreeb commented 5 months ago

What I meant is that Apache Commons FileMonitor and filtering for .tex files is great (because the manual way has more efforts). Just go ahead with that. Seems to be more clean code!

Great! Will implement it and submit a pull request soon.

When using Apache Commons FileMonitor , that quesion is obsolete, isn't it?

Still need to know how to handle the case where a .tex file in the watched directory includes another .tex file outside the watched directory. How to monitor changes for that file? Because in that case, only need to monitor that file, not its entire directory.

Maybe it's better not to think about those cases for now. Implementing the basics first will make it easier to determine what should be done later.

koppor commented 5 months ago

When using Apache Commons FileMonitor , that quesion is obsolete, isn't it? Still need to know how to handle the case where a .tex file in the watched directory includes another .tex file outside the watched directory.

How to monitor changes for that file? Because in that case, only need to monitor that file, not its entire directory.

PhD thesis are typically split among several files. Thus, watching a directory and all sub directories for .tex files should cover almost all cases.

If it is too difficult to watch a directory for .tex file creation, then we can rely on the "Refresh" button though.