Memoize and speed up builds - Feature Request

zeroby0 commented 3 years ago

I've read in the issues that spidering through the files is what takes the longest time.

Can we use save the results of spidering in Netlify cache, and use git to figure out changes and spider only the changed files in the next build?

papandreou commented 3 years ago

I don't think anyone actually profiled it, so we don't know if the slow bit is loading/parsing the assets, or if it's the tracing of the font-related CSS properties for every HTML element. If it's the latter, we should look into optimization ideas, and memoization/caching could certainly play a part. If the HTML and all the CSS hasn't changed, we know that the tracing will yield the same result as last time.

zeroby0 commented 3 years ago

I see!

Profiling seems worth doing. I'm new to the code base, but I'll try to do it. I have an idea :D

zeroby0 commented 3 years ago

Great news! @papandreou

So I have done some profiling, and, 1 font file takes almost the same time to process as 1 HTML file. Given most projects use 3-4 fonts at most, but have dozens of HTML files, memoization should help greatly.

html_woff2

Number of files	HTML	Woff2
1	3.3	3.3
2	3.6	3.4
4	3.8	3.8
8	4.0	3.8
16	4.5	4.4

Profiling process

To measure HTML file processing time, I generated 11 folders, with 2^n,n=[0..10] html files and 1 font each. All the html files use the same font. For font processing time, I generated 11 more folders, with 2^n,n=[0..10] font files and 1 html file each. The html file uses all the 2^n fonts.

Then in each of these folders, I ran npx subfont *.html -ris --dry and timed it. I repeated this timing 5 times per folder, and the variance in the times is what you see as error bars in the plot.

All the processing was done in a ramdisk. The font used is Inter 400 Regular woff2. Of course, CSS files may have slightly different processing time, but the big picture is the same.

Here is a zip of the workspace I profiled with. rfile.txt and rfont.txt contain times for file and font. file.py and font.py generate the folders used for profiling.

subfont-profiling.zip

zeroby0 commented 3 years ago

Actually we probably shouldn't / can't use git because we might be processing build artifacts of another stage in the pipeline, and they aren't tracked in git.

So we should maintain a hash table with the file hash and a shallow asset graph that includes inlined assets, but not other files. And then construct the whole tree from the memoized bits and calculated bits.

papandreou commented 3 years ago

Hmm, it would be interesting to also node --prof what the "per HTML" execution time is spent on. If it's the HTML/CSS parsing, working out the CSS cascade, or if it's the tracing of the font-related CSS properties per HTML element. I suspect it's the latter, and if that's the case, then you're right -- we could load the assets and compute that hash, then memoize the result of the trace with that as the key.

Munter / subfont

Memoize and speed up builds - Feature Request #150

Profiling process