JabRef / jabref

Graphical Java application for managing BibTeX and biblatex (.bib) databases
https://devdocs.jabref.org
MIT License
3.58k stars 2.49k forks source link

Implement caching for citation relations #11189

Open koppor opened 5 months ago

koppor commented 5 months ago

Context

PR https://github.com/JabRef/jabref/pull/10980 introduced a memory saving for citation relations to prevent JabRef's memory usage growing constantly. The drawback is that citation relations might become unavailable due to API limitations of the online service.

Quoting https://libguides.ucalgary.ca/c.php?g=732144&p=5260798

The API allows up to 100 requests per 5 minutes. To access a higher rate limit, complete the form to request authentication for your project.

That means, for a large library, I cannot step through the references, because I could hit the rate limit. I know, this is seldom, but it could happen at following setting: In a corporate setting: all requests are going through a proxy. Thus, the rate limit is not per person, but per the SUM of persons. In case 100 researchers work in parallel with JabRef, each researcher can get ONE request per 5 minutes. And per entry TWO requests are needed: For the citing and the cited by.

The following solution would really help to make the citation relations really usable. (Because the information for each DOI is stored independent of each library and is presented as soon it is availbable.)...

Task 1

Let's investiage MVStore. MVStore is a library storing the values of a hashmap on disk. Thus, NOT in memory. Thus, it takes less memory than a full hash map in memory, because it is on disk. -- MVStore routes through the request to a map entry to disk. - See https://www.h2database.com/html/mvstore.html for details.

Estimate: 100 lines of code, but scattered around JabRef. NativeDesktop needs to be touched etc. The most difficult thing will be the closing of the MVStore. Since the DOIs are globally unique, one can close the MVStore when JabRef is shut down. This makes it "easier" (in comparison to https://github.com/JabRef/jabref/pull/10937, where for each tab some closing thing were necessary). -- Nevertheless, it could be that this will be a back-and-forth code development (meaning: code reviews with significant changes could come up).

Implementation hint: Do NOT do it like org.jabref.logic.journals.JournalAbbreviationRepository, because there, there is no direct access to the MVStore, but new hashmaps are created. We really need to work on the Maps provided by the MVStore. Because the MVStore "intelligent" does memory management.

Task 2

Follow-up requirement: Refresh the DOI information if one week passed since the last fetch. Maybe this can be baked into the HashMap designed for the MVStore.

rohit-garga commented 5 months ago

Hey @koppor, I would like to take this up

github-actions[bot] commented 5 months ago

Welcome to the vibrant world of open-source development with JabRef!

Newcomers, we're excited to have you on board. Start by exploring our Contributing guidelines, and don't forget to check out our workspace setup guidelines to get started smoothly.

Having any questions or issues? Feel free to ask here on GitHub. Need help setting up your local workspace? Join the conversation on JabRef's Gitter chat. And don't hesitate to open a (draft) pull request early on to show the direction it is heading towards. This way, you will receive valuable feedback.

Happy coding! πŸš€

alexandre-cremieux commented 3 months ago

Hello @koppor

Is this issue still reserved ?

If not then I would be quite interested to provide a solution.

After taking a look to the main branch code I see possible improvements in the repository design that could also:

I do not know MVStore yet but I saw that H2 provides a fairly extensive documentation.

alexandre-cremieux commented 3 months ago

Hello @koppor and @ThiloteE ,

I was wondering if you could provide some insights to help me understand the whole design:

Regards

koppor commented 2 months ago

Dear @alexandre-cremieux, we are currently either on holidays or busy with other projects. Therefore only a short answer: Your ideas sound great! Separation from GUI and logic is required.

General package dependencies are listed at https://devdocs.jabref.org/getting-into-the-code/high-level-documentation.html. Feel free to refine.

Background: We also get contributions from students with a less strong wish for a clean architecture. After a few rounds, we let the code through: Better a feature with an OKish implementation than a feature not coming, because no one has time to work on clean code...

alexandre-cremieux commented 2 months ago

Hello @koppor

Thanks for your answer and documentation sharing. I was quite busy those last days also.

I understand your position. I might take some time, but I'm on my way to refactor the package as we agree on this.

Depending of my commit management, I may open separate PRs for refactoring and new implementation to try to ease the review.

Regards

alexandre-cremieux commented 12 hours ago

Hello @koppor ,

I refactored the fetch and caching logic by separating the concerns into layers and opened a first PR (as a draft).

Could you please take a look and add you comments before I write the documentation for the new interfaces ?

Here is this link: https://github.com/JabRef/jabref/pull/11845

After addressing your comments, I will let the draft become a PR to be merged and start to work on implementing an MVStore based repository.

Regards.

koppor commented 11 hours ago

@alexandre-cremieux Thank you for working on this! I am very busy with another task. I think, I can dive into this on Wednesday noon the earliest. - Until then, could you resovle the conflict with main branch? πŸ˜…

alexandre-cremieux commented 11 hours ago

Okay, thanks @koppor. Just ping me when you reviewed the code. I fixed the conflicts but they will be back each time main is updated.

koppor commented 10 hours ago

Okay, thanks @koppor. Just ping me when you reviewed the code.

I think, GitHub will send you a notificatoin :)

I fixed the conflicts but they will be back each time main is updated.

Should not be the case. Only if classes you changed were changed on main, too. - Do you feel differently?

alexandre-cremieux commented 8 hours ago

I fixed the conflicts but they will be back each time main is updated.

Should not be the case. Only if classes you changed were changed on main, too. - Do you feel differently?

Indeed, there was a commit before that modified the CitationsRelationsTab on which I had to work: Fix architecture gui/logic (#11379)

koppor commented 8 hours ago

Should not be the case. Only if classes you changed were changed on main, too. - Do you feel differently? Indeed, there was a commit before that modified the CitationsRelationsTab on which I had to work: Fix architecture gui/logic (#11379)

Large refactorings do not happen that often πŸ˜…. OK, if @leaf-soba continues contributing, changes will much higher than 1% 🏎.

Siedlerchr commented 8 hours ago

Just merge the main, much better than rebasing and less conflict potential