The-Encryption-Compendium / TECv2

Hugo-based version of The Encryption Compendium.
https://encryptioncompendium.org
GNU General Public License v3.0
4 stars 1 forks source link

Check bibtex for duplicate indexes #54

Closed dkg closed 3 years ago

dkg commented 3 years ago

This is a simple test -- any duplicate bibtex ID will cause all but one of the entries to be skipped in entries_dict due to the collision.

The goal is to abort the site build if data.bib has such a duplicate entry.

This is still pretty weak: it doesn't test whether data.bib is valid bibtex, it doesn't verify anything else about the entries, but it also offers a place to do more in-depth consistency/cleanliness checks on the data scraped from zotero.

Addresses (but does not close) #46.

dkg commented 3 years ago

note: when i run this against data.bib in the the-encryption-compendium.github.io repo, i get the following outcome:

duplicate IDs: {'obama_coalition_2016', 'noauthor_considerations_2016'}

The obama_coalition_2016 entries look like this:

@misc{obama_coalition_2016,
    title = {Coalition {Letter} to {President} {Obama} 04/11/2016},
    url = {https://www.accessnow.org/cms/assets/uploads/2016/04/Encryption-Letter.pdf},
    language = {English},
    collaborator = {Obama, Barack},
    month = apr,
    year = {2016},
    keywords = {2010s, Backdoors, Compliance with Court Orders Act},
}
[…]
@misc{obama_coalition_2016,
    title = {Coalition {Letter} to {President} {Obama} 10/27/2016},
    collaborator = {Obama, Barack},
    month = oct,
    year = {2016},
    keywords = {2010s, Access Now, Apple, Backdoors, EFF, International},
}

These seem to be distinct documents, though there is so little data about the second one i don't know what it is. I think it's this followup to the https://savecrypto.org/ petition.

and the noauthor_considerations_2016 entries look like this:

@misc{noauthor_considerations_2016,
    title = {Considerations for {Encryption} in {Public} {Safety} {Radio} {Systems}},
    publisher = {Federal Partnership for Interoperable Communications},
    month = sep,
    year = {2016},
    keywords = {2010s, Public Safety Radios},
}
[…]
@misc{noauthor_considerations_2016,
    title = {Considerations for {Encryption} in {Public} {Safety} {Radio} {Systems}},
    url = {https://www.dhs.gov/sites/default/files/publications/20160830_fact_sheet_considerations_final_draft508_0.pdf},
    language = {en},
    publisher = {Federal Partnership for Interoperable Communications},
    month = sep,
    year = {2016},
    note = {Two-pager explanatory document},
    keywords = {2010s, Department of Homeland Security, Encryption Standards, Public Safety, Public Safety Radios},
}

I think this pair is a duplicate, and the first one should be deleted from the file.

kernelmethod commented 3 years ago

Thanks for making this! Sorry it took a bit to get around to looking at it. It looks good to me though. I've just deleted the duplicate you found from Zotero, so that should hopefully no longer be an issue.