Closed rcalvo12 closed 1 year ago
@MosesStewart welcome to Template! Once you accept the invitation to join this repository, please self-assign and review the issue description.
A first step is probably to make sure you can build the entire Template on FASRC. After that you can start testing parallel builds.
Please keep in mind that this repository is in the public domain.
Thanks!
We'd like to explore whether it's possible to use this option successfully on FASRC.
To give an update, I'm able to use the -j
option successfully without any warnings when running on my own system. I'm still trying to sort out the executable path for stata
on the FASRC to test it there.
To give an update, I'm able to use the
-j
option successfully without any warnings when running on my own system.
@MosesStewart that's great! Do you get any speedup in runtime, or is the repository too fast to be able to tell?
In the latter case, we could create an issue branch and add some slow operations to the repository (that are not dependent on one another) so we can see if we're getting any improvement from parallelizing.
I'm still trying to sort out the executable path for
stata
on the FASRC to test it there.
Got it. Per here:
The builders are going to look for Stata under the name StataMP-64
. If when you type StataMP-64
from the bash prompt Stata doesn't run, then the builders likely won't be able to run it, either.
In that case, you could look for the correct command to run Stata from the bash prompt. Suppose for the sake of argument that it's StataCommand
.
Then I think we just need to tell Linux that when you type StataMP-64
you really mean StataCommand
. I think (?) aliases will do this but I haven't tested it directly (at least not for a while).
Let me know if any of that is helpful, thanks.
@jmshapir changing the stata executable in Template/source/lib/JMSLab/builders/executables.yml
to stata-mp
fixed the problem!
Thank you so much for the detailed explanation! I really appreciate it!
I didn't have permission to create a simlink in the FASRC applications folder, so I think I'll run it like so!
Do you get any speedup in runtime, or is the repository too fast to be able to tell?
The repository is too fast for me to tell :sob:
In the latter case, we could create an issue branch and add some slow operations
Would it be possible for me to gain write-access to edit the code?
@MosesStewart
@jmshapir changing the stata executable in
Template/source/lib/JMSLab/builders/executables.yml
tostata-mp
fixed the problem!Thank you so much for the detailed explanation! I really appreciate it!
I didn't have permission to create a simlink in the FASRC applications folder, so I think I'll run it like so!
Sounds good! Let's just be sure not to commit the revised executables.yml
to the repository.
(And when @rcalvo12 @ew487 let's ask them what solution they use for this detail about the executables.)
Do you get any speedup in runtime, or is the repository too fast to be able to tell?
The repository is too fast for me to tell ðŸ˜
In the latter case, we could create an issue branch and add some slow operations
Would it be possible for me to gain write-access to edit the code?
Done! Use it wisely.
Does anyone know how to force scons
to build all target files, even if they're up-to-date?
I've tried scons .
and scons /
, which both should build all target files according to the link in https://github.com/JMSLab/Template/issues/72#issue-1476787870. I've also tried scons --config=force
, but I haven't been able to get scons to re-build the entire repository when it's up to date already.
I'm trying to re-build the entire repository to compare times with and without the -j
option.
@MosesStewart thanks!
Does anyone know how to force
scons
to build all target files, even if they're up-to-date?
I'm not sure but here are some ideas:
sconsign.dblite
. (This is how scons
keeps track of the state of the build, so deleting it might force scons
to rebuild everything.)/output
folder. (I think the --clean
option mentioned in the github post above might automate this? If not, deleting manually is a brute-force approach, and it should work as a last resort.)I've tried
scons .
andscons /
, which both should build all target files according to the link in #72 (comment).
Those commands will only build targets that are not currently up to date, since scons
is designed to make sure targets are up to date using the minimum amount of compute.
I'm trying to re-build the entire repository to compare times with and without the
-j
option.
Great!
I'm not sure but here are some ideas:
Thank you so much! The StackExchange solution worked! I'm getting very interesting results with the -j
option!
# Cores | Time: 0:00 | Completion
1
core | Time: 3:27
| Completed
2
core | Time: 2:19
| Failed
4
core | Time: 0:50
| Failed
The 2
and 4
task parallelizations that failed both did so in paper/SConscript
, which is almost the last step the scons build and is almost instant. Therefore, I'm confident that the -j
option does significantly increase speed at approximately a 1:1 ratio with the number of cores.
I'll try to dig into why the lyx
environment is raising an ExecCallError
when we use multiple cores.
@jmshapir I don't think the -j
option can be used successfully in most cases.
From my testing, I hypothesize that when we ask scons
to build 3
targets in parallel, it will always try to build 3
targets at the same time. In other words, even if the target scons
needs to build next depends on a target currently being built, scons
will try to start building it anyways and the program will fail.
In this repository, both the build #output/paper/TemplateTex.pdf
and build #output/paper/TemplateLyx.pdf
were failing when building multiple targets only when they included #source/figures/gdp_educ.eps
and #output/tables/top_gdp.tex
, which is built by scons
the step before. The only way I could get the build to complete was by removing the dependencies on earlier steps. Therefore, I don't think that the -j
option is optimized so that scons
knows to wait for earlier dependencies to finish building.
I do think if scons
gets an update, this could easily be fixed and the -j
option could become incredibly useful
Thanks @MosesStewart!
In this repository, both the build
#output/paper/TemplateTex.pdf
and build#output/paper/TemplateLyx.pdf
were failing when building multiple targets only when they included#output/tables/top_gdp.tex
, which is built byscons
the step before.
Can you point me to which SConscript
controls the building of #output/tables/top_gdp.tex
? I want to make sure I understand how the DAG is structured here in case there's something we're missing.
Therefore, I don't think that the
-j
option is optimized so thatscons
knows to wait for earlier dependencies to finish building.
Since parallelization is a key part of the scons design, I'd be surprised if it has this limitation. If it did, I'd expect it would have been reported. If you have bandwidth, it might be worth your spending a little time (say, not more than 1-2 hours) on the scons github repository to see if there are any open issues that relate to the problem we're having here.
Can you point me to which
SConscript
controls the building of#output/tables/top_gdp.tex
? I want to make sure I understand how the DAG is structured here in case there's something we're missing.
Sorry I attached the wrong file name! It is #output/tables/top_gdp.lyx
not .tex
that is being built in this step:
https://github.com/JMSLab/Template/blob/1bedc394b8898f3b4ea44c922c335a51ebc53974/source/tables/SConscript#L3-L5
I believe that #output/tables/top_gdp.tex
isn't being built by anything.
Since parallelization is a key part of the scons design, I'd be surprised if it has this limitation. If it did, I'd expect it would have been reported. If you have bandwidth, it might be worth your spending a little time (say, not more than 1-2 hours) on the scons github repository to see if there are any open issues that relate to the problem we're having here.
Yes! I will start this right away!
I believe that
#output/tables/top_gdp.tex
isn't being built by anything.
Thanks @MosesStewart! In that case, do we know why TemplateTex.pdf
isn't building successfully when we paralellize? It seems like the dependency on top_gdp.tex
can't be the culprit, since as you say that file isn't being built by anything.
Thanks @MosesStewart! In that case, do we know why
TemplateTex.pdf
isn't building successfully when we paralellize? It seems like the dependency ontop_gdp.tex
can't be the culprit, since as you say that file isn't being built by anything.
No, I hadn't noticed before that top_gdp.tex
wasn't dependent on anything, but I'm sure that editing source/papaer/template.tex
to not include output/tables/top_gdp
allows scons
to not fail on this step of the process. I'm at a loss for the cause ~ I'm trying to change different things and see what causes it to fail here during parallelization
Thanks @MosesStewart!
I wonder if it might be a good idea to build a minimal working example of what we're trying to do, maybe in a separate repository? That way we could strip out some of the complexity of the Template
and focus on the bare essentials.
If that sounds good to you here is a process I propose:
SConsTest
in your own namespace (i.e., under @MosesStewart rather than JMSLab)scons
structure using out of the box builders rather than JMSLab builders. Something like:
/source/paper/
: contains test.tex
which is compiled to /output/paper/test.pdf
using the PDF()
builder/source/zip
: contains SConscript
which zips test.pdf
to /output/zip/test.zip
using the ZIP()
builderIf we find that parallelization breaks here then we have a simple example we can use to diagnose potential explanations. We could even post an issue to the scons
github repository pointing to the example.
If we find that parallelization works here then that might help us get to the bottom of why it's breaking in Template
.
Let me know what you think. If you think it would be helpful to talk this through feel free to shoot me an e-mail and we can schedule a call or meeting.
Thanks!
The test repository is here. I'm currently changing things in test.tex
and important_figure.tex
to see if I can break scons
with the -j
function.
Thanks @MosesStewart! That test repository looks good to me. A couple thoughts:
important_figure.tex
? I'd have thought that for a minimal example the pipeline test.tex-->test.pdf-->test.zip
might be sufficient, since that would support -j
with 2 simultaneous jobs?/output/paper
directory you could add a .gitignore
local to that directory that specifies which files to ignore.If we're finding that -j
works in the test repo but not in Template
, that could mean that something is off with the way we've specified the DAG in Template
. In order to parallelize the build, it's crucial that scons
has all the dependencies exactly right, since otherwise it might try to build targets before their inputs are ready.
To visualize the DAG, I think you can use scons --tree=type
where type
determines exactly what sort of tree is drawn. Maybe it's worth having a look at the DAG in case we're missing something?
To visualize the DAG, I think you can use
scons --tree=type
wheretype
determines exactly what sort of tree is drawn. Maybe it's worth having a look at the DAG in case we're missing something?
That's very helpful! I'll look into it!
- Do we need
important_figure.tex
? I'd have thought that for a minimal example the pipelinetest.tex-->test.pdf-->test.zip
might be sufficient, since that would support-j
with 2 simultaneous jobs?
I wanted to test if anything my be breaking when trying to import a file. Currently the -j
is working in the test repo, and me changing file locations and such isn't breaking it.
@jmshapir nothing jumped out at me after looking at the DAG, but I narrowed down exactly what's causing the parallelization to fail.
Changing the repository by:
source/figures/gdp_educ.lyx
figure from the source/paper/template.lyx
document
https://github.com/JMSLab/Template/blob/227dcce1700e3c559e3f9a371df9857bfaddbf91/source/paper/Template.lyx#L269-L278source/figures/gdp_educ
figure from source/paper/template.tex
document
https://github.com/JMSLab/Template/blob/227dcce1700e3c559e3f9a371df9857bfaddbf91/source/paper/Template.tex#L48allows the parallelization to work.
I didn't have to change any of the dependencies in source/paper/SConscript
for it to compile. So, I think something's happening when including the source figures in either document that's causing scons
to fail only when using the -j
option, but succeeding otherwise. I included what the repository looks like when it succeeds in a2c53bc
Thanks @MosesStewart!
Does the DAG reflect the fact that gdp_educ.lyx
and gdp_educ.tex
depend on gdp_educ.eps
?
@jmshapir they depend on gpd_educ.lyx
and gpd_educ.tex
respectively:
For source/papers/template.lyx
:
https://github.com/JMSLab/Template/blob/227dcce1700e3c559e3f9a371df9857bfaddbf91/sconstruct.log#L85-L96
https://github.com/JMSLab/Template/blob/227dcce1700e3c559e3f9a371df9857bfaddbf91/sconstruct.log#L95
For source/papers/template.tex
https://github.com/JMSLab/Template/blob/227dcce1700e3c559e3f9a371df9857bfaddbf91/sconstruct.log#L97-L101
https://github.com/JMSLab/Template/blob/227dcce1700e3c559e3f9a371df9857bfaddbf91/sconstruct.log#L100
@MosesStewart that makes sense. But if we look at gdp_educ.lyx
, for example, it includes a call to gdp_educ.eps
:
Does any SConscript
currently declare that gdp_educ.lyx
depends on gdp_educ.eps
?
If not, I wonder if there's a hole in the DAG, such that we have represented to scons
that we have
stuff-->gdp_educ.eps
gdp_educ.lyx-->Template.lyx
but really we have
stuff-->gdp_educ.eps-->gdp_educ.lyx-->Template.lyx
If that were the case, then the DAG would imply (incorrectly) that it is fine to build, say, gdp_educ.eps
and Template.lyx
at the same time.
Could that be the culprit here?
@jmshapir I didn't notice that ~ that seems very likely ~ I will test right away
It works!!!!!!!!!!!!!!!!!!!!!!!!!!!!! MWAHAHAHAHAHAHAHAHAHAHAHA
@MosesStewart nice! One thing I think we've learned that if we want to use this option we need to be quite meticulous about declaring dependencies, even those that aren't needed for a serial build.
The other thing I'd ideally like to check before we leave the issue is that we are actually seeing some improvements in run time. Is it possible to look into that? As before, it might be necessary to add a "speedbump" to some of the steps in order to make the run time calculations meaningful.
@rcalvo12 @ew487
@jmshapir currently I'm getting 3:18
without parallelization and 1:56
with parallelization (~42%
speed boost). This is only with "speedbumps" in 2 of the files though ~ the more speedbumps I add, the faster the parallelization becomes!
@MosesStewart that's great! It sounds to me like this is working as intended.
The next steps I envision are the following:
scons -j
is functional. (No need for speedbumps.)Of course if you think I've missed something just say!
Summary:
In this issue we tested the functionality of the -j
option for scons
in the repository.
After testing, we found that it is pivotal to carefully indicate all dependencies so the DAG is set up to prevent scons
from building dependent files in parallel. We updated source/figures/scons
to prevent scons
from building gdp_educ.eps
and template.pdf
in parallel when using the -j
option, opening up parallelization in the repository.
Final state of the issue branch is in f6297fe
scons
represents the build steps as a directed acyclic graph.This means it can calculate which paths can be built in parallel and which must be built in sequence.
The
-j
option is designed to allow a user to run multiple simultaneous tasks.We'd like to explore whether it's possible to use this option successfully on FASRC.