acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
406 stars 280 forks source link

Reducing unnecessary calls in Makefile #815

Closed mbollmann closed 3 years ago

mbollmann commented 4 years ago

Something that has been bothering me for a while now: When I develop locally, running make keeps regenerating the YAML/BibTeX/etc. files every time, even when I did not change the data files.

Example: I just ran make with a clean build directory, which crashed during the Hugo call. I made a fix in a Hugo template and ran make hugo, but this regenerated all YAML/BibTeX/MODS XML files that had already been generated during the previous call.

I believe this is because most targets (including the YAML and BibTeX target) depend on build/.static, which in turn depends on the Hugo directory.

Can this be optimized? If I see this right, it's only the line @cp -r hugo/* build that needs to depend on the Hugo directory, so maybe this can be refactored into a separate target? I don't know the Makefile well enough, so I'm rather asking here. :)

akoehn commented 4 years ago

Yes, the dependencies were an easy approximation. I will have a look at it after I have finally defended my thesis (Monday), do remind me if I forget :-)

akoehn commented 4 years ago

In the mean time: you do use make -j4 to speed things up, right?

akoehn commented 3 years ago

I'm pretty sure this is fixed now. @mbollmann do you still see calls you don't like?

mbollmann commented 3 years ago

Oh, I must have missed when you made changes to the Makefile. The YAML target has not been changed though. If I check out master, run make, then touch hugo/layouts/papers/single.html and make again, it will regenerate all of the YAML files.

akoehn commented 3 years ago

True. If create_hugo_yaml.py does not use any generated / copied over files, we can simply change build/.yaml: build/.static $(sourcefiles) venv/bin/activate to build/.yaml: build/.basedirs $(sourcefiles) venv/bin/activate and that case should be covered. Seems like it should be safe but you probably know that better than me.

mbollmann commented 3 years ago

To be honest, I don't quite follow what build/.basedirs actually achieves. It runs @mkdir -p build/data-export/papers, which is not actually used anymore (though build/data-export/volumes is), and either way create_bibtex.py will create these folders already.

What I can say is that neither the build/.yaml nor the build/.pages target use any Hugo files, so should not need to depend on build/.static as far as I can see. But they also do not use the build/data-export directory, so should not depend on build/.basedirs either, unless I misunderstand what your intention behind this is. :)

akoehn commented 3 years ago

The basedirs target creates three directories (due to the -p switch): build, build/data-export and build/data-export/papers. All other tasks rely on build existing and therefore rely on the basedirs target.

I just took the directories to be generated from the previous code; if papers is not needed anymore, we can change that.

There is, however, reliance upon directory generation, see the error when page generation does not depend on static (and therefore the build/content directory does not exist):

. "venv/bin/activate" && python3 bin/create_hugo_pages.py --clean
INFO     Creating stubs for papers...
Traceback (most recent call last):
  File "bin/create_hugo_pages.py", line 231, in <module>
    create_papers(dir_, clean=args["--clean"])
  File "bin/create_hugo_pages.py", line 74, in create_papers
    if not check_directory("{}/content/papers".format(srcdir), clean=clean):
  File "bin/create_hugo_pages.py", line 51, in check_directory
    os.mkdir(cdir)
FileNotFoundError: [Errno 2] No such file or directory: '/home/arne/software/vimeo-linker/acl-anthology/build/content/papers'

I'll post a PR with fixes.