acl-org / acl-style-files

Official style files for papers submitted to venues of the Association for Computational Linguistics
700 stars 173 forks source link

improve arxiv references #32

Closed LSinev closed 6 months ago

LSinev commented 9 months ago

Improves work with bib files got from arXiv directly.

Works with @misc type only. Supports archivePrefix or eprinttype. Sets arXiv preprint if publisher = {arXiv} and no eprint number is provided. So it works with arXiv-produced bib (more info at https://tex.stackexchange.com/a/679743/79756), and some bibs from doi2bib.org created from arxiv doi.

No support for primaryClass (no example from ACL), no detection of versions from url (seems too hard for me), and no support for @article with eprint from DBLP (too much to test for now — may become worse).

Files to check and view improvements result

bib_experiment_after.pdf bib_experiment_before.pdf acl_natbib_new202401_test.zip

May fix #9

Invite to check/review/approve: @nschneid @davidweichiang @danielgildea @postylem

P.S. At the start, this addition was inspired by output.eprint of ACM-Reference-Format.bst from acmart but diverged later.

davidweichiang commented 9 months ago

Thanks! Is there any hope of seeing a diff of the changes that isn't the entire bst file?

LSinev commented 9 months ago

entire bst file

wow!.. i'll check, may be some LF <-> CRLF line endings conversion happened

EDIT: While i am checking if some line ending changes happened, you can use github feature to look at changes ignoring whitespaces change: 2024-01-03 22 17 23 github com a57c45f22945

LSinev commented 7 months ago

For ease of review, rebased (and force pushed) branch with changes.

davidweichiang commented 7 months ago

Thanks! Is there a consensus that "arXiv preprint, arXiv:nnnn.nnnnn" is the right citation format? (To me it seems redundant to have the word "arXiv" twice.)

danielgildea commented 7 months ago

1) I agree with David that it is redundant to have "arxiv" twice, and, poking around, I don't see other people doing this.

2) The .bst file has some comments saying that it is produced by a script. We were maintaining the input to the script, but it seems to have disappeared from git. Let's maintain this .bst file from now on, and remove the comment saying not to maintain it.

LSinev commented 7 months ago
  1. I will remove unnecessary duplication
  2. Not sure which comment need to be removed. I will try to reapply urlbst script if it will not make things worse

EDIT: Many bst files from previous ACL conferences seems to be stored at https://github.com/Pinafore/publications/tree/master/style

LSinev commented 7 months ago

One of commits creates some sort of "nourl" state. I applied latest urlbst 0.9.1 after that. Some lines have changes in trailing space removal (comparing with starting state of this PR).

Screenshot of result (zip archived files from PR opening message can still be used to check) image

LSinev commented 7 months ago

I think I can check if it will be better to recreate the whole file from the latest version of makebst and merlin.mbs (2011 instead of 2002 described in this file comments), selecting everything to comply with later changes described in comments inside.

LSinev commented 6 months ago

Rebuilt from scratch with custom-bib (options described in the commit message) and applied all the improvements that still make sense.

acl_natbib_basic.zip — docstrip batch job (to be run with latex acl_natbib_basic.dbj) for archival purposes (probably in a repo with sources of style files).

Files to check and test applied changes.

acl_natbib_new202403_test.zip

bib_experiment_after_recreation_with_improvement.pdf

bib_experiment_before.pdf

danielgildea commented 6 months ago

Thanks for your work. I've tested with some other papers, and this looks good to me.