JabRef / jabref

Graphical Java application for managing BibTeX and biblatex (.bib) databases
https://devdocs.jabref.org
MIT License
3.63k stars 2.59k forks source link

Show information about the journal #6189

Closed tobiasdiez closed 1 year ago

tobiasdiez commented 4 years ago

In the entry editor, there should be a button next to the journal field that displays a bunch of information about the journal in a PopOver. For example similar to image (from eigenfactor)

A few possible datasources are discussed here: https://academia.stackexchange.com/questions/3/where-can-i-find-the-impact-factor-for-a-given-journal (related: https://github.com/ikashnitsky/sjrdata)

dimitra-karadima commented 4 years ago

@tobiasdiez I want to tackle this issue! Can you give more specific details on where the relevant code is in order to add the button.
I have also googled on how can you search Google Programmatically and based on the link: http://www.eigenfactor.org/projects/journalRank/rankings.php?bsearch=COMMUNICATIONS+IN+MATHEMATICAL+PHYSICS&searchby=journal&orderby=eigenfactor I thought I'd have just the "COMMUNICATIONS+IN+MATHEMATICAL+PHYSICS" as a variable in order to search every time for the specific journal. What do you think about it? I haven't done anything similar before so if you have a better idea, it would be much appreciated!

tobiasdiez commented 4 years ago

Thanks for your interest @dimitra-karadima! I had a quick look and it looks like none of the above sources have a proper API that would give us easy access to the data. Thus, it seems this will be a bigger project.

I would propose the following:

What do you think?

dimitra-karadima commented 4 years ago

@tobiasdiez very helpful input! But I don't think I can handle it right now. I am not really keen on python scripts, json files let alone combining them even though you have done a great work finding all the files needed! It is going to take me much more time than I thought and right now I am kinda busy. So I am going to find a smaller issue and maybe come back to this when I find some extra time if no one else has tackled it till then! And again I am really sorry for dropping the issue.

tobiasdiez commented 4 years ago

That is very understandable. I wasn't aware of how much work this is until I wrote down what needs to be done.

But we do have a few issues that should be smaller tasks. For example, the ones tagged with good-first-issue and the ones concerning fetcher are usual also pretty self-contained. Looking forward to your next PR!

ilippert commented 4 years ago

Generally, it might be of interest to query whether journals are listed as open access journals. https://doaj.org/api/v1/docs

koobs commented 3 years ago

CrossRef has a Journals resource that might be handy.

CrossRef API Documentation: Resources

Example API Calls:

KallePettersson commented 3 years ago

Hi we are a group of five university students (@davyie, @osclind, @LukasGutenberg and @martinfalke) who would like to work on this issue as part of the course DD2480 Software Engineering Fundamentals at KTH Royal Institute of Technology. Is there anything in particular we should know about?

tobiasdiez commented 3 years ago

Cool, thanks for your interest! The approach outlined in https://github.com/JabRef/jabref/issues/6189#issuecomment-619371443 is still pretty much up-to-date. Do you have any questions concerning this? (btw: in place of python you could have course also use java to download and pre-format the journal info if you prefer that)

Edit: It would be also nice if the info could be shown directly in the entry editor as a popover instead of a new dialog, using http://fxexperience.com/controlsfx/features/#popover.

martinfalke commented 3 years ago

We've attempted a bunch of different solutions for this issue and we did not get very far on either of them. We have decided to write a summary of our suggested way going forward, as the course we are working on this issue through is coming to an end. Below we have divided the work into sections based on the completeness of the different parts. After that comes notes on some different parts.

Complete or requiring minor changes:

Started but incomplete:

Not started or minor start:

1. Python script

Concerning the Python script, we figured that it would be important to stick to the standard modules, so that no packages need to be installed prior to running it. However, there were additional problems that occurred that are currently unresolved. The first problem is choosing which API to use, see 3. for a comparison between a few that we considered. The second problem is choosing how to use the API that is chosen (e.g. live retrieval of data vs. downloading all of the data and storing it). This is further discussed in 4A.

This script uses the URL query from Scimagojr to retrieve data. We discovered some troubling issues with this method that are described in more detail inside the spoiler below.

Code with notes on problems **Problems** - The data returned from Scimagojr is in .csv-format. Its rows are separated by newlines, but there are also some cells that contain newlines, causing the rows to break at some points. This is currently only handled by checking the length of the list `cells` that is returned from splitting on the delimiter `;`, in which case the row is skipped entirely. This could for instance be solved by hard-coding an expected number of columns for each row, which is then used to detect broken rows. The broken rows should then be merged with their respective subsequent row. - Searching through each row for the correct ISSN involves a lot of string splitting and comparison, which is *very* time-consuming to the point of it being infeasible to do live. A case where the correct row is at the end of the .csv-file can take up to a minute, if not more. This could be resolved if Scimagojr allowed filtering the query on ISSN (e.g. `https://www.scimagojr.com/journalrank.php?year=2014&issn=15461718&out=xls` would download only the data for Nature Genetics (ISSN: 15461718) as a single .csv-row). - The code would require maintenance of at least two things that may change over time. The first is the order of columns that is currently hard-coded to check the fifth column for ISSN. The second is the values of start_year and end_year that is based on what data is available on Scimagojr. Presumably newer years are added eventually, but 2020 does not exist as of 2021-03-08, and old data might be removed. - This is not exactly a problem, but the script would be called with one argument, and that argument should be the ISSN for the journal that the button was pressed. ```python import urllib.request as rq import sys def journal_url(year): # Search view: https://www.scimagojr.com/journalrank.php?year=2019 baseURL = 'https://www.scimagojr.com/journalrank.php?year=' # '&out=xls' returns a .csv-file with the journal rankings of the specified year downloadQuery = '&out=xls' return baseURL + str(year) + downloadQuery def get_year_stats(year, issn): response = rq.urlopen(journal_url(year)) # Response Status 200 OK if response.status == 200: lines = response.read().splitlines() for l in lines: decoded = l.decode('utf-8') cells = decoded.split(';') if len(cells) < 5: continue # incomplete data # cells[4] should contain a list of ISSNs for the journal, separated by ', ' # e.g. "12345678, 98765432" if issn in cells[4].split(', '): return str(year) + ";" + decoded # prepend the year return [] # TODO: handle case where ISSN is not found else: return [] # TODO: revise potential HTTP error handling # which journal the data is fetched for journal_issn = '15458601' # default for testing issn = sys.argv[1] if len(sys.argv) > 1 else journal_issn # range of available years as of 2021-03-08 start_year = 1999 end_year = 2019 years = range(start_year, end_year+1) journal_stats = [] for y in years: journal_stats.append(get_year_stats(y, issn)) # TODO: # write aggregated stats to .json-file ```

2. Unit tests for script results

The unit tests committed in src/test/java/org/jabref/gui/fieldeditors/JournalEditorPopOverTest.java were intended to be used for automated testing of the script working as expected. Since the script is incomplete, these have not been run and should instead be viewed as sketches for tests that can be used, if the feature is developed. They are mostly based on the unit tests found in ThemeTest.java as they also utilize file operations in a temp directory (@TempDir annotation from junit).

One of the tests, namely invalidISSNreturnsEmptyData, checks that the given ISSN is not found in the temp file, and should therefore either return empty data or throw an exception. The other test checks whether the given ISSN is found in the temp file. If the entry with the given ISSN is found then it should return the data.

3. Comparison between API's

CrossRef

Pros:

Scimagojr

Pros:

Elseiver

Pros:

DOAJ

Pros:

4. Conclusion and proposed solutions

We believe that this is a far more complex issue than it initially may seem like. We urge the developers to reconsider whether the feature (and the experience it provides) is really worth the effort, as well as the complexity and need for maintenance it adds to the project. Below we propose a few solutions that we think are the most viable options moving forward.

A.

Download all this data at a set event (e.g. startup of the application) with a cleanup of previously saved results so that the results are ready to be presented as soon as the button is pressed. This would on the other hand slow down the startup of the program for a feature that may not be used all that much.

B.

On pressing the button, the webpage that has the stats is simply opened in a browser (e.g. the Scimagojr page for Nature Genetics). The problem of not being able to directly create a URL based on ISSN persists, though.

C.

Scrap the feature altogether, perhaps at most provide a tooltip that recommends learning more about journals on an external webpage (such as Scimagojr).

tobiasdiez commented 3 years ago

Thanks a lot for the work and effort you put into this. Very much appreciated.

I didn't anticipated that it would be so difficult to get the data from scimagojr. Sorry! Given these problems, I would suggest to run the script once and then commit the generated json file with the code. This file can then be loaded, hopefully on the fly when people click the button. Has the disadvantage that we need to run the script regularly, but that should only be necessary once a year so no big deal.

I understand that your project is coming to an end. Nonetheless it would be nice if you could prepare a PR with the changes you have so far. Then we can take it over from there. Of course, you are invited to continue working on it as well.

martinfalke commented 3 years ago

@tobiasdiez Absolutely no worries, it is a natural part of the development process and we are happy to have helped. A draft pull request for the issue can now be found at #7541.

calixtus commented 3 years ago

One could probably take the draft by @martinfalke and finish it if that is ok?

aqurilla commented 1 year ago

Hi, I would like to take up this issue

calixtus commented 1 year ago

Hi @aqurilla thanks for your interest. We already have seen a few PRs of you, so there is no doubt, that you you are able to complete it. But from the comments above it seems, that this will also take quite some time complete the implementation. So if you think that this is suitable for you we would be very happy if you decide to work on this. If you have questions you can always ask us via Gitter chat or by email. We would appreciate it if you create an early draft PR to document your progress, so we can help and support your work.

aqurilla commented 1 year ago

Appreciate the heads-up @calixtus! I'll create a draft PR to share progress on this issue