dyne / reflow-os

Base scripts to run Reflow OS
7 stars 2 forks source link

Feedback; (local dev environment) io issues under ongoing load #6

Open ocataco opened 3 years ago

ocataco commented 3 years ago

This could have to do with not having enough resources in my local docker environment, but i'm reporting it just in case... My resources allotted to docker: (CPUs: 4, Memory 8GB, Swap 2GB)

I'm experiencing various IO errors and unresponsiveness in local dev environment docker containers when running lots a consistent load with no elay.

I've tried adding a delay of 0.1 (errors still occur) and 1 seconds (no more errors) between queries in the simulation.

For example i've seen something relating to meilisearch

14:45:12.947 request_id=FqfJSsAyxM-qv4gAAOQh [warn] Search - Could not put object: %{"errorCode" => "internal", "errorLink" => "https://docs.meilisearch.com/errors#internal", "errorType" => "internal_error", "message" => "update store was shut down due to a fatal error, please check your logs for more info."}

I'v seen where the web application gave up

/Users/taco/.rvm/rubies/ruby-3.0.1/lib/ruby/3.0.0/net/protocol.rb:227:in `rbuf_fill': end of file reached (EOFError)

And an IO error relating to postgrex

REFLOW OS ERROR!!!: 500 Internal Server Error variables: {:event=>{:note=>"unpacked by a_tsc - 2021-10-03T00:00:00+02:00", :action=>"produce", :provider=>"01FF84MT8804S3CVCXFQP7YSNC", :receiver=>"01FF84MT8804S3CVCXFQP7YSNC", :hasPointInTime=>"2021-10-03T00:00:00+02:00", :resourceQuantity=>{:hasUnit=>"01FF8477V63ZK8JD5Q0YX731X4", :hasNumericalValue=>1}}, :newInventoriedResource=>{:trackingIdentifier=>"http://cleanlease.nl/zs/fbb68248-0015-4df5-bc7d-71daf4f71884", :name=>"Gown", :tags=>[], :note=>"Clean Lease Schort: fbb68248-0015-4df5-bc7d-71daf4f71884", :currentLocation=>"01FF84ZSP2R6YSDMK475Z9W0MZ"}}
4:36
14:35:07.236 request_id=FqfIvbn5S_-TV3gAAIyB [error] The API encountered an exceptional error
%Postgrex.Error{connection_id: 185, message: nil, postgres: %{code: :io_error, file: "md.c", line: "460", message: "could not open file \"base/16384/19767_fsm\": I/O error", pg_code: "58030", routine: "mdopen", severity: "ERROR", unknown: "ERROR"}, query: nil}
14:35:07.236 request_id=FqfIvbn5S_-TV3gAAIyB [error] ** (Postgrex.Error) ERROR 58030 (io_error) could not open file "base/16384/19767_fsm": I/O error
adam-burns commented 3 years ago

Hi @ocataco

Thanks for the feedback.

Can you provide observed repeatable or even deterministic steps that cause any of these symptoms at this stage?

Are you able to observe similar results on another instance, removing suspicions of resource constraints in local VM impacting behaviour?

ocataco commented 3 years ago

Hi Adam,

Stefano and I tried to run the simulation on the "shared instance", but run into Internal Server Errors too. As we don't have access to the logs there, we don't know a lot more.

First we ran a small scenario, which went ahead successfully. Then we tried a months worth of simulation, with a delay of 0.2 seconds between each call, and that failed quickly in the seeding stage of the simulation. (Lots of produce events sequentially) We then increased the delay to 0.5, but the scenario immediately fails with an internal server error.

Maybe you can investigate the server logs to see what happened!

Kind regards,

Taco