JabRef / abbrv.jabref.org

A repository of abbreviations for references, e.g., for conferences, journals, institutes, etc.
https://abbrv.jabref.org
Creative Commons Zero v1.0 Universal
138 stars 80 forks source link

Issue 120 fix #128

Closed sreenath-tm closed 1 year ago

sreenath-tm commented 1 year ago

Solves the issue #120 The script reads all the csv files other than the file "journal_abbreviations_general" and if there is any entry in the rest of the file that is present in "journal_abbreviations_general" will be removed.

The script was executed once and the resultant "journal_abbreviations_general" file has replaced the older version with duplicate entries. The format of each entry in the CSV file is expected to be ;[;[;]]. However no data have all these fields set and based on how they are set the entries in the CSV file that were handled during the script development process were of three types which are as below

Around 80% entries follow the second format, 18% follow the first format and 2% follows last format. The third format need not be considered as it is consistent but when the last 2 fields are not set we needed to decide which format to choose. To streamline the same, the output generated by the script will be of the first format { The one that ends with ";;" -- Can be changed based on discussion}.

koppor commented 1 year ago

If any frequency is existing, it can just be removed!

sreenath-tm commented 1 year ago

If any frequency is existing, it can just be removed!

I modified the script to check for any entries with the frequency field. I can confirm there do not exist any entries with the frequency field set.

sreenath-tm commented 1 year ago

The modified script handles only based on the Title column and the condition checked will be case insensitive. The entries have been reduced to 1891 lines and as discussed the entry will have only 3 columns where frequency column has not been considered.

koppor commented 1 year ago

Thank you for working on this. A good next step.