acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
431 stars 288 forks source link

Add EndNote citation format #235

Closed mjpost closed 5 years ago

mjpost commented 5 years ago

I argue you to get back the EndNote importer file. Thanks.

Originally posted by @GabrielLin in https://github.com/acl-org/acl-anthology/issues/170#issuecomment-479816995

mjpost commented 5 years ago

I feared this day would come.

I imagine it would be simple to generate these and add a button. I wonder if it will add a lot to the build time.

What we could also consider is to have this part done dynamically. We could run a web server with CGI on aclanthology.info and have it build the EndNote file for any URL that is accessed.

What do you think @mbollmann and @villalbamartin?

stevenbedrick commented 5 years ago

Rather than needing to “go the full EndNote” (and thus commence the game of non-BibTeX format whack-a-mole), one option would be to just add RIS output. EndNote can read that, I believe, as can basically everything else that isn’t BibTeX. I seem to recall that the format is a bit simpler than endnote’s, though I could be misremembering.

mjpost commented 5 years ago

Oh, that's great to know. Bibutils supports RIS, too (xml2ris).

mjpost commented 5 years ago

@GabrielLin can you confirm that EndNote can read RIS format, and that that would therefore be sufficient?

villalbamartin commented 5 years ago

If RIS is enough, then I'm all for it. Failing that, the same utility that generates RIS (xml2ris) can also generate EndNote (xml2end), so we could in theory have it supported as a lesser format. But the least formats we need to support, the better.

mjpost commented 5 years ago

If we're going to generate the dynamically, I think we should make both available. If statically as part of Hugo, then we should just do RIS.

GabrielLin commented 5 years ago

@mjpost , EndNote can import RIS format, but it needs some configurations. In some cases, it might fail to import just like the situation of importing .bib files. If having EndNote format, it can make sure the import successfully. As a result, I strongly suggest adding the EndNote format, just like the old version of ACL Anthology. Thanks.

mjpost commented 5 years ago

@mbollmann, do you have any thoughts on this? Should we add 100k files to our export (EndNote + RIS)? Or take the dynamic approach?

I do worry about the technical debt of generating these dynamically offsite. On the other hand this might be more important than I give credit to as a non-user of either of these formats, and the implementation shouldn't be that hard.

@villalbamartin are you looking into this or are we tabling it?

knmnyn commented 5 years ago

I agree with @GabrielLin. We want to encourage other fields to cite our work so that we can position all of our authors as prominent scholars even outside of CL and NLP.

Having EndNote and/or RIS is useful for these folks, even if the files for the build might not be built on every run. Separately, you may want to start adding version numbers to bib files so that they can be tracked if they get built and differ from the current ones. I'll post a new issue for that if you'd like.

mbollmann commented 5 years ago

A dynamic approach would essentially mean writing a wrapper around bibutils. It sounds pretty simple, but it does add another layer of complexity and maintenance cost. I'm not sure if there's any significant cost to generating 100k+ more files, but it's relatively trivial to implement, so there's little cost to trying (and potentially reverting) it.

mjpost commented 5 years ago

Let's try that [edit: meaning static generation] first. What if we just did Endnote to start with? Who/what uses RIS?

stevenbedrick commented 5 years ago

RefWorks, Mendeley, etc., though as far as I know both of those can also read and write bibtex. I guess you could say that RIS is more "cross-platform" than EndNote, but I don't know if the marginal utility of including RIS is worth the extra files, given that the technical difference between creating RIS vs. Endnote is nonexistent. I hadn't realized, before, that it was the same command-line utility producing both, and my earlier advocacy for RIS was based on the thought that it's a simpler format to generate- but given that we're using existing tools, it's a moot point.

villalbamartin commented 5 years ago

It looks to me as if the bibutils should work out of the box, at least in theory. If it's okay with everyone I will take care of this.

mjpost commented 5 years ago

That'd be great! As static files, right?

villalbamartin commented 5 years ago

Yes, I'm trying to do it with the very minimum, using the files generated by the bib2xml_wrapper script.

I am currently having problems compiling the full version in my computer, though, so I'm opening a ticket about that.

villalbamartin commented 5 years ago

Small update on this question:

I wonder if it will add a lot to the build time.

In my underpowered laptop, the step "Converting BibTeX files to MODS XML" takes 02:19min. Generating the EndNote files in the exact same fashion takes 06:03min. Converting BibTeX to MODS can be done at 357 files/sec, while MODS to EndNote runs at 136 files/sec.

I am testing a version of the code right now that implements the GUI aspects. I expect to make a pull request by tomorrow.

GabrielLin commented 5 years ago

Does it seem that the EndNote button disappear?

mjpost commented 5 years ago

This was a build error and has been fixed.

GabrielLin commented 5 years ago

Thank you @mjpost