Option for generation of abbreviations

FlamingTempura / bibtex-tidy

Cleaner and Formatter for BibTeX files

https://flamingtempura.github.io/bibtex-tidy/

MIT License

824 stars 62 forks source link

Option for generation of abbreviations #253

Open pedropaulofb opened 2 years ago

pedropaulofb commented 2 years ago

Hi Peter @FlamingTempura!

Frequently the number of pages offered in Call for Papers is not enough for fitting all research content. In this case, it is a common practice to reduce the bibliography size using abbreviations and acronyms. Doing this manually is time-consuming and error-prone. It would be wonderful if your tool could have an option to automatically generate allowed abbreviations! As an example, "Proceedings of the International" can become "Proc. Int.".

The ‘Conference Abbreviations’ section of the IEEE Editorial Style Manual (2021) (p. 63) provides many examples of valid abbreviations.

As I always say in my contributions: congratulations on the tool! I use it constantly and I am always recommending it to my colleagues (e.g., here).

pedropaulofb commented 1 year ago

Hi @FlamingTempura! I am using your tool in a paper and, one more time, I am facing the need to reduce the size of the generated references.

Checking for a better list of abbreviations, I found this source, which is based on ISO 4. It is, by far, the best source for the abbreviations I could find.

As it already provides the list in csv file, it is not hard to implement this feature. So please consider this source instead of the last one I sent you.

Thanks for your excelent work!

FlamingTempura commented 1 year ago

Thanks for the suggestion, and apologies for not responding sooner.

I can see this being a useful feature but there are some complexities:

How do we package the data? The linked CSV is 1.6MB (~500kb gzipped), which is a lot to add to the JS bundle, particularly for the browser. We could get the browser to dynamically load the CSV, but we'd also need to think about the CLI and JS library... probably each using separate approaches.
We also would need to ensure that rules are followed with replacements (e.g. not replacing names); it's not a straightforward search and replace. A JS library exists which may be worth looking into https://github.com/marcinwrochna/abbrevIso/blob/master/browserBundle.js

So while I think this could be a good feature it's looking like a difficult one to implement. That's to say, don't expect this feature soon. I'm wondering if this might be more suitable as a separate tool. I'm surprised one doesn't exist already.

andrewfowlie commented 5 months ago

What’s would be an acceptable database size for replacement rules?

I have a suspicion that 90% of journal abbreviations could be covered by 10% of the ISO4 replacement rules. On top of that, the third column of the rules (language) can be discarded for these purposes.

Applying a subset of ISO4 rules to the journal entry could still be very useful.

pedropaulofb commented 5 months ago

Another option would be to start only with the English abbreviations. That would cover the most important cases.

andrewfowlie commented 5 months ago

I will make a a few minimal databases, and let’s see the sizes.


import pandas as pd

data = pd.read_csv("ltwa_current.csv", sep="\t")

data = data.loc[(data['LANGUAGES'].str.contains('Multiple Languages')) | (data['LANGUAGES'].str.contains("English"))]  # only english
data = data.drop(columns=['LANGUAGES'])  # drop language column
data = data.dropna()  # drop nans (these are words that explicitly don't have a replacement rule)

compression_opts = dict(method='zip', archive_name='reduced_ltwa_current.csv')
data.to_csv('reduced_ltwa_current.zip', index=False, compression=compression_opts)  # save as zipped csv

This gives a 41kb compressed/122kb uncompressed database, containing only explicitly English or multiple languages rules, and removing words that are explicitly marked as do not abbreviate.

flxmr commented 4 months ago

https://github.com/marcinwrochna/abbrevIso this also has some JS code to do the abbreviations!