fsprojects / FSharp.Formatting

F# tools for generating documentation (Markdown processor and F# code formatter)
https://fsprojects.github.io/FSharp.Formatting/
Other
464 stars 155 forks source link

Async function takes longer than usual #795

Closed yigitl closed 1 year ago

yigitl commented 1 year ago

I have an async function that takes around an hour when run in a notebook or a fsx file. This function processes a lot of data, aggregates them, and then outputs a chart. I noticed dotnet fsdocs build --eval takes five hours to complete its process when this function is involved. Just calling the function causes it to take that long.

I have tested the document with other functions, and they seem to work fine. It's just this function which has an Array.Parallel.mapi that takes 5 times more time during fsdocs build process.

I checked and searched the documentation to see if there was a pitfall about the performance, but nothing really came up.

I'm curious about why this could be. I'm guessing there might be something about the fsdocs build process that somehow limits the asynchronicity of the underlying function that I'm trying to call? What would be the best practice when dealing with a function that processes huge amounts of data?

I'm quite new to fsdocs, so I might be missing something. I'm not even sure where to begin debugging this in the context of fsdocs, so I'm looking forward to any ideas.

$ dotnet --version
7.0.100

$ dotnet fsdocs version
fsdocs 17.2.2
nhirschey commented 1 year ago

If you're using Array.Parallel, as a first check I'd make sure that you have server GC configured. The easiest way is by setting an environment variable, see here. Server GC can substantially speed up parallel calculations, and this is not specific to fsdocs.

What outputs are you generating? If you are generating .html, .fsx, and .ipynb outputs then fsdocs will run the code 3 times. That could be part of your issue.

More generally, can you cache the chart, rather than building it each time? You could include a cached chart using regular markdown

(***do-not-eval***)
// This code will show up but it will not be evaluated
Array.Parallel.mapi veryLongFunction

(**
Some markdown text
![myChart](path-to-images/cachedChart.png)
*)

As an aside, Array.Parrallel is parallel but not Asynchronous. Async typically refers to async expressions.

yigitl commented 1 year ago

If you're using Array.Parallel, as a first check I'd make sure that you have server GC configured. The easiest way is by setting an environment variable, see here. Server GC can substantially speed up parallel calculations, and this is not specific to fsdocs.

I didn't know much about this, that's interesting.

What outputs are you generating? If you are generating .html, .fsx, and .ipynb outputs then fsdocs will run the code 3 times. That could be part of your issue.

The output is generated from an .fsx file yes. This partly explains the slowdown, as running it 3 times would cause around 3.5 hours of build time. The remaining hour or so might be caused by garbage collection or some other factors.

Why does fsdocs run .fsx files three times though? I might have missed it but I don't think I have seen this mentioned anywhere.

More generally, can you cache the chart, rather than building it each time? You could include a cached chart using regular markdown

This is what I have in mind, in case I cannot find a proper fix.

Thank you for the quick reply.

nhirschey commented 1 year ago

Why does fsdocs run .fsx files three times though?

If you're only generating .html (the default, what it sounds like you're doing), it will run once. But fsdocs allows creating multiple outputs from a .fsx script, and it'll run once per output type. See e.g. the _template.ipynb references here.

That's how the files backing the "download in notebook" link are generated on this site's docs page, such as at the top of the creating content page I just linked to.

yigitl commented 1 year ago

If you're only generating .html (the default, what it sounds like you're doing), it will run once. But fsdocs allows creating multiple outputs from a .fsx script, and it'll run once per output type. See e.g. the _template.ipynb references here.

Right, I thought it was running it three times for other reasons. As you guessed I'm only getting an html output. Then it's still very strange to me that this function takes 4-5 times as much time only when I'm calling it from fsdocs.

yigitl commented 1 year ago

Setting the server GC actually brought the build time down to the expected duration. Thank you for the help @nhirschey.