Closed bloussou closed 5 days ago
It sounds like the request from editoast to core is too large and some lib on either side can't handle it.
If it's actually the origin of the bug, we could easily batch the requests. We just need to not separate the ressource uses of a single zone into different requests.
It sounds like the request from editoast to core is too large and some lib on either side can't handle it.
If it's actually the origin of the bug, we could easily batch the requests. We just need to not separate the ressource uses of a single zone into different requests.
It looks like takes
have payload limits and should be increased.
It looks like takes have payload limits and should be increased.
I didn't really find anything for that, but I may be wrong, googling takes
is painful. I found ways to wrap requests to limit their sizes, but it's not used by default. If you have found a way to configure this I'd love a link.
Batching the requests should be straightforward in any case.
I tried to reproduce the bug but couldn't quite do it naively, the front-end became unresponsive before it triggered. I'll fill a large timetable with a script instead of using the GUI, but that will wait until tomorrow
It looks like takes have payload limits and should be increased.
I didn't really find anything for that, but I may be wrong, googling
takes
is painful. I found ways to wrap requests to limit their sizes, but it's not used by default. If you have found a way to configure this I'd love a link.Batching the requests should be straightforward in any case.
I tried to reproduce the bug but couldn't quite do it naively, the front-end became unresponsive before it triggered. I'll fill a large timetable with a script instead of using the GUI, but that will wait until tomorrow
I didn't find documentation either. It was a guess that the limit was coming from the HTTP server framework.
I still can't reproduce locally, it's working fine with 3k overlapping 1000km long trains (110k conflicts). Though it's getting way too long and doesn't seem to scale nicely, with 3k trains it takes 571s (465s spent in core).
At some point editoast has a stack overflow (around 3k6 trains).
I'm guessing that it's related to the deployment resources. Maybe core gets OOM-killed while building or sending the response. I'll try to play around that to reproduce, but if that's the cause of the issue the fix won't be easy.
After looking at a profiler and using different -Xmx
values:
When the memory limit is far away, memory use does peak while sending the response. It seems that the memory allocated during the conflict detection isn't freed when building the response, the peak is quite large.
But when the memory limit is close, the actual peak memory use is during the conflict detection. And it's supposed to throw a "clean" error.
My hypothesis is that the -Xmx
value in deployment doesn't precisely reflects the memory that can be used without being killed. If that's the case, we should first fix it, but we can also help a little by hinting at the GC that it should do some cleanup once the conflicts are computed.
@bloussou I don't think I can look much further than that into this issue. I can open a PR for the GC hint, but I won't know if it fixes it for sure as it seems related to the deployment env.
After testing some hypothesis, core behaves perfectly fine when we use too much RAM. It does throw a clean "out of memory" error and the -Xmx
values are fine.
After checking the logs, core did get killed while sending the response. But the RAM use was perfectly fine. It's not clear why or how it was killed.
Can't reproduce, closing.
We can open again if we get new cases of core mysteriously dying randomly.
What happened?
See this scenario : https://rec-osrd.reseau.sncf.fr/operational-studies/projects/4/studies/41/scenarios/53\
What did you expect to happen?
I expect to see all the conflict in the frontend without an error
How can we reproduce it (as minimally and precisely as possible)?
What operating system, browser and environment are you using?
OSRD version (top right corner
Account
button >Informations
)d9655c0