dingmaotu / mql-zmq

ZMQ binding for the MQL language (both 32bit MT4 and 64bit MT5)
Apache License 2.0
553 stars 301 forks source link

Port from c++ of the common "Lazy Pirate" pattern client causes random memory access violation #33

Closed VideoMem closed 5 years ago

VideoMem commented 5 years ago

Solution-0

I have wrapped the Lazy pirate Pattern into a .dll And I made a MQL binding for it. It can be found here

What is provided is included in the readme. If I found a way to do the correct libzmq.dll imported calls from MetaTrader without crashing the terminal under wine 3.0, I would update this.

UPDATE-2 (unconfirmed hypothesis)

I will cover this in more detail in next update. I'm having success with this pattern, only when wrapped inside another dll that contains its abstractions at C/C++ level. I'm working with it, and I will publish them when done. This problem seems to be related with "the art" of passing strings, and objects structures through dll boundaries in imported function parameters, and the C null terminated string issues when its null end is truncated caused by an incorrect size parameter. Also the size of the dll string returned must be allocated in the caller memory before to the call and return from dll boundaries. And also, my advice is to avoid the MetaTrader string type at this level ignoring completely the documentation that shows an example with it. The fact is that works "sometimes", and maps to wchar_t* type, that also, will lead to a "string marshalling" overkill at dll level. CharArray seems to do the job when properly terminated.

As you remember in strcpy() documentation:

To avoid overflows, the size of the array pointed by destination shall be long enough to contain the same C string as source (including the terminating null character), and should not overlap in memory with source.

As this where not enough, there is also a problem with the string encoding through platforms that's not very clear how it must be resolved.

I'll update with more details.

UPDATE-1 Definitively the c++ version works as expected, it can be downloaded here:

Now here, different language but same program, throws (sometimes) an unhandled exception on socket.poll() and in init() when new Socket() is called after delete or when socket.send() I'm trying to figure out why that happens.

Updated testing version can be found here

Also I found a related issue here. socket.send(ZmqMsg var) will not preserve the contents of var, this is unexpected for me. zmq c++ binding doesn't show that issue.

UPDATE-0 I found a workaround for this on a c++ version here

It can be build from command line in Linux:

g++ -ansi multiworker_lpclient.cpp -lzmq -o multiworker_lpclient

Also, I've updated the example below here Although, that is not a very robust solution, it performs better than the previous version.

Issue description

I'm getting a Random Access violation on libzmq.dll trying to implement the "Lazy Pirate" pattern client in mql EA code.

If both or one of the workers are offline I got sometimes:

On Windows 10:

Access violation read to 0x00000390 in 'C:\Users...\MQL4\Libraries\libzmq.dll'

Wine 3.0:

Access violation read to 0xE8CA00A0 in 'C:...\MQL4\Libraries\libzmq.dll'

This is a sample startup log of that condition:

2019.05.05 05:07:15.933 2018.10.01 00:00:00  ZMQAccessViolation test started
2019.05.05 05:07:15.961 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Creating new socket
2019.05.05 05:07:15.979 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Opening new client socket connection to tcp://localhost:5555
2019.05.05 05:07:15.979 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Creating new socket
2019.05.05 05:07:15.981 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Opening new client socket connection to tcp://localhost:5566
2019.05.05 05:07:15.982 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: On tick loop number: 0
2019.05.05 05:07:15.982 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: 3 WorkerA tcp://localhost:5555 #this is a R script
2019.05.05 05:07:15.982 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Sending data to worker 19 bytes ...
2019.05.05 05:07:15.982 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Receiving data from worker. Sent payload : #this is a R script
2019.05.05 05:07:16.984 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Received Response, 19 bytes
2019.05.05 05:07:16.984 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: reply body: #this is a R script
2019.05.05 05:07:16.984 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: 3 WorkerB tcp://localhost:5566 { "some_json_log": { "SYMBOL": "EURUSD", "MAGIC": "42", "etc ...":"etc" }}
2019.05.05 05:07:16.984 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Sending data to worker 74 bytes ...
2019.05.05 05:07:16.984 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Receiving data from worker. Sent payload : { "some_json_log": { "SYMBOL": "EURUSD", "MAGIC": "42", "etc ...":"etc" }}
2019.05.05 05:07:17.986 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Received Response, 74 bytes
2019.05.05 05:07:17.986 2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: reply body: { "some_json_log": { "SYMBOL": "EURUSD", "MAGIC": "42", "etc ...":"etc" }}
2019.05.05 05:07:18.050 2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: On tick loop number: 1
2019.05.05 05:07:18.050 2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: 3 WorkerA tcp://localhost:5555 #this is a R script
2019.05.05 05:07:18.050 2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Sending data to worker 19 bytes ...
2019.05.05 05:07:18.050 2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Receiving data from worker. Sent payload : #this is a R script
2019.05.05 05:07:20.553 2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: No response from worker, retrying ... 
2019.05.05 05:07:20.553 2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Retries left: 2
2019.05.05 05:07:20.553 2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Creating new socket
2019.05.05 05:07:20.553 2018.10.01 00:00:02  Access violation read to 0xE8CA00A0 in 'C:\Program Files\MetaTrader Standard G\MQL4\Libraries\libzmq.dll'
2019.05.05 05:07:20.553 2018.10.01 00:00:02  Testing pass stopped due to a critical error in the EA
2019.05.05 05:07:20.553 GBPUSD+,M5: 2 tick events (1 bars, 5851652 bar states) processed in 0:00:04.619 (total time 0:00:12.962)

In this case, one of the workers simulated a failure exiting itself. If I remove the "socket reassignment code" and just simply don't modify the socket object once created. It can be done commenting out the lines delete(socket) and initZMQ() on lines 114, 115 of the script respectively. I got the communication with both workers to work for a while, but sometimes it completely freezes the terminal, when some worker stop responding, or when it becomes online again after their random self exit. If I force both workers not to exit, to verify if it is not a single point of error, and leave it run. This can be done commenting out the break sentence on the python sample worker script at line 24 I got it to work, for a while longer than before, but after all it crashes again. If both addresses point to the same worker, the result is the same. I think that I'm missing something here.

This is a log recovery from non exiting workers condition:

0   06:23:29.515    Expert ZMQAccessViolation GBPUSD+,M5: loaded successfully
3   06:23:34.725    TestGenerator: no connect to trade server, default environment will be applied
0   06:23:34.725    TestGenerator: current spread 37 used
1   06:23:37.125    TestGenerator: unmatched data error (high value 1.28343 at 2019.02.14 12:20 is not reached from the least timeframe, high price 1.28337 mismatches)
1   06:23:37.125    TestGenerator: unmatched data error (low value 1.28295 at 2019.02.14 12:20 is not reached from the least timeframe, low price 1.28305 mismatches)
2   06:23:37.914    2018.10.01 00:00:00  ZMQAccessViolation test started
0   06:23:37.949    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Creating new socket
0   06:23:37.969    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Opening new client socket connection to tcp://localhost:5555
0   06:23:37.969    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Creating new socket
0   06:23:37.973    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Opening new client socket connection to tcp://localhost:5566
0   06:23:37.975    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: On tick loop number: 0
0   06:23:37.975    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: 3 WorkerA tcp://localhost:5555 #this is a R script
0   06:23:37.975    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Sending data to worker 19 bytes ...
0   06:23:37.975    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Receiving data from worker. Sent payload : #this is a R script
0   06:23:38.977    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Received Response, 19 bytes
0   06:23:38.977    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: reply body: #this is a R script
0   06:23:38.977    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: 3 WorkerB tcp://localhost:5566 { "some_json_log": { "SYMBOL": "EURUSD", "MAGIC": "42", "etc ...":"etc" }}
0   06:23:38.977    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Sending data to worker 74 bytes ...
0   06:23:38.977    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Receiving data from worker. Sent payload : { "some_json_log": { "SYMBOL": "EURUSD", "MAGIC": "42", "etc ...":"etc" }}
0   06:23:39.979    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: Received Response, 74 bytes
0   06:23:39.979    2018.10.01 00:00:00  ZMQAccessViolation GBPUSD+,M5: reply body: { "some_json_log": { "SYMBOL": "EURUSD", "MAGIC": "42", "etc ...":"etc" }}
0   06:23:40.043    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: On tick loop number: 1
0   06:23:40.043    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: 3 WorkerA tcp://localhost:5555 #this is a R script
0   06:23:40.043    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Sending data to worker 19 bytes ...
0   06:23:40.044    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Receiving data from worker. Sent payload : #this is a R script
0   06:23:41.045    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Received Response, 19 bytes
0   06:23:41.045    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: reply body: #this is a R script
0   06:23:41.045    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: 3 WorkerB tcp://localhost:5566 { "some_json_log": { "SYMBOL": "EURUSD", "MAGIC": "42", "etc ...":"etc" }}
0   06:23:41.045    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Sending data to worker 74 bytes ...
0   06:23:41.045    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Receiving data from worker. Sent payload : { "some_json_log": { "SYMBOL": "EURUSD", "MAGIC": "42", "etc ...":"etc" }}
0   06:23:43.546    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: No response from worker, retrying ... 
0   06:23:43.546    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Retries left: 2
0   06:23:43.546    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Creating new socket
0   06:23:43.548    2018.10.01 00:00:02  ZMQAccessViolation GBPUSD+,M5: Opening new client socket connection to tcp://localhost:5566
1   06:23:43.549    2018.10.01 00:00:02  Access violation write to 0xCA00A849 in 'C:\Program Files\MetaTrader Standard G\MQL4\Libraries\libzmq.dll'
3   06:23:43.549    2018.10.01 00:00:02  Testing pass stopped due to a critical error in the EA
0   06:23:43.549    GBPUSD+,M5: 2 tick events (1 bars, 5851652 bar states) processed in 0:00:05.635 (total time 0:00:13.871)

Sources: libzmq.dll version 4.3.2.0 compiled in VS2019 (dlls here) Worker code: Lazy Pirate server in Python (code here) Agent sample: Lazy pirate pattern MQL client to worker REQ REP tests examples This report repository: here

Tested on: MetaTrader Standard Version 4.00 build 1170 x86 (20 Dec 2018) Windows 10 x64 And latest Wine 3.0 on Linux

//+------------------------------------------------------------------+
//|                                           ZMQAccessViolation.mq4 |
//|                                  Copyleft 2019, swilwerth@github |
//|                                             github.com/swilwerth |
//+------------------------------------------------------------------+
// This is a bugreport code. It'll crash Metatrader Standard Terminal
// under certain conditions.
// Do not use it in any production code.

#property copyright "Copyleft 2019, swilwerth@github"
#property link      "github.com/swilwerth"
#property version   "1.00"
#property strict

#include <Zmq/Zmq.mqh>

#define REQUEST_TIMEOUT     2500    //  msecs, (> 1000!)
#define REQUEST_RETRIES     3       //  Before we abandon

class WorkerClientBase {
    protected:
        string zmq_address;
        Socket *socket;
        Context *context;
        void connect();
        void initZMQ();
    public:
        virtual string getName() { return "WorkerClientBase"; }
        void setAddr(string addr) { zmq_address = addr; }
        ZmqMsg sendTX(string payload);
        WorkerClientBase(Context &ctx);
        ~WorkerClientBase();
};

class WorkerA: public WorkerClientBase {
    public:
        WorkerA(Context &ctx, string addr);
        string getName() { return "WorkerA"; }
        void txSomething();
};

class WorkerB: public WorkerClientBase {
    public:
        WorkerB(Context &ctx, string addr);
        string getName() { return "WorkerB"; }
        void txSomething();
};

WorkerClientBase::WorkerClientBase(Context &c) {
    context = &c;
    initZMQ();
}

WorkerClientBase::~WorkerClientBase() {
    delete(socket);
}

void WorkerClientBase::initZMQ() {
    printf("Creating new socket");
    this.socket = new Socket(context, ZMQ_REQ);
}

//Start of lazy pirate pattern client TX adapted to MQL5 from
//sample code in c++ http://zguide.zeromq.org/cpp:lpclient
void WorkerClientBase::connect() {
    printf("Opening new client socket connection to %s", zmq_address);
    while(!socket.connect(zmq_address)) {
        printf("Error connecting to worker");
    }
    //  Configure socket to not wait at close time
    int linger = 0;
    socket.setLinger(linger);
}

ZmqMsg WorkerClientBase::sendTX(string payload) {
    ZmqMsg request(payload);
    ZmqMsg reply;
    PollItem item;
    PollItem items[1];

    int global_retries = REQUEST_RETRIES;
    int retries_left = global_retries;
    int request_timeout = REQUEST_TIMEOUT;
    bool expect_reply = true;
    while(retries_left) {
        printf("%d %s %s %s", retries_left, getName(), zmq_address, payload);
        printf("Sending data to worker %d bytes ...", StringLen(request.getData()));
        socket.send(request);

        printf("Receiving data from worker. Sent payload : %s", payload);

        expect_reply = true;
        socket.fillPollItem(item, ZMQ_POLLIN);
        items[0] = item;
        while(expect_reply) {
            socket.poll(items, request_timeout);
            //got reply
            if ((items[0].revents & ZMQ_POLLIN) == 1) {
                // Get the reply.
                socket.recv(reply);
                printf("Received Response, %d bytes", reply.size());
                if(reply.size() > 0) {
                        printf("reply body: %s",reply.getData());
                        return reply;
                } else
                    printf("Malformed reply from worker");
            } else if(--retries_left == 0) {
                printf("Worker seems to be offline, abandoning ...");
                expect_reply = false;
                break;
            } else {
                printf("No response from worker, retrying ... ");
                printf("Retries left: %d", retries_left);
                delete(socket);
                initZMQ();
                connect();
                socket.send(request);
            }
        }
    }
    return reply;
}
//end of lazy pirate client code

WorkerA::WorkerA(Context &c, string addr):WorkerClientBase(c) {
    setAddr(addr);
    connect();
}

WorkerB::WorkerB(Context &c, string addr):WorkerClientBase(c) {
    setAddr(addr);
    connect();
}

void WorkerA::txSomething() {
    string payload = "#this is a R script";
    sendTX(payload);
}

void WorkerB::txSomething() {
    string payload = "{ \"some_json_log\": { \"SYMBOL\": \"EURUSD\", \"MAGIC\": \"42\", \"etc ...\":\"etc\" }}";
    sendTX(payload);
}

//+------------------------------------------------------------------+
//| Expert initialization function                                   |
//+------------------------------------------------------------------+
Context *sharedContext;
WorkerA *workerA;
WorkerB *workerB;

int OnInit()  {
   sharedContext = new Context("ECHOTEST");
   workerA = new WorkerA(sharedContext, "tcp://localhost:5555");
   workerB = new WorkerB(sharedContext, "tcp://localhost:5566");

   return(INIT_SUCCEEDED);
}
//+------------------------------------------------------------------+
//| Expert deinitialization function                                 |
//+------------------------------------------------------------------+
void OnDeinit(const int reason) {
    delete(workerA);
    delete(workerB);
    delete(sharedContext);
}

//+------------------------------------------------------------------+
//| Expert tick function                                             |
//+------------------------------------------------------------------+
int tickC = 0;
void OnTick() {
    printf("On tick loop number: %d", tickC);
    workerA.txSomething();
    workerB.txSomething();
    tickC++;
}
//+------------------------------------------------------------------+
VideoMem commented 5 years ago

I confirmed this also on c++ I elevated the question to zmq team. https://github.com/zeromq/libzmq/issues/3495

dingmaotu commented 5 years ago

Thanks for reporting and thorough description for the problem. It might be helpful to others.