MISP / misp-docker

A production ready Dockered MISP
GNU General Public License v3.0
156 stars 94 forks source link

504 Gateway Time-out while querying MISP with pymisp #59

Closed HugeekBo closed 1 month ago

HugeekBo commented 5 months ago

Hi,

I have been using MISP a lot and i used to building my own MISP docker image from the sources but now I'm very happy to use the new production ready misp-docker.

While switching to the new docker image, I notice that the misp-docker project is using nginx instead of apache. I'm not experience with nginx but I think it's a powerfull tool once one master's it.

I'm also using pymisp 2.4.190 to pull Events and attributes from MISP 2.4.192 to build custom list of IoCs to feed my FW. That was working well with my previous docker image build from the sources and using apache server.

Now, while pulling the same IoCs i see a new behaviour that i didn't have in the past, 504 Gateway Time-out. This occurs when I'm pulling a long list of IoCs using pymisp.

PyMISP displayed error CRITICAL:pymisp:Unknown error: the response is not in JSON. Something is broken server-side, please send us everything that follows (careful with the auth key): Request headers: {'User-Agent': 'PyMISP 2.4.190 - Python 3.12', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/json', 'Connection': 'keep-alive', 'Cookie': 'CAKEPHP=', 'Content-Length': '434', 'content-type': 'application/json'} Request body: {"returnFormat": "json", "type": ["ip-dst", "ip-src", "url"], "tags": {"AND": ["canssoc:event-classification=\"generic\"", "canssoc:feed"]}, "withAttachments": 0, "metadata": 0, "published": true, "enforceWarninglist": 0, "to_ids": 1, "includeEventUuid": 0, "includeEventTags": 0, "sgReferenceOnly": 0, "includeContext": 1, "headerless": 0, "includeSightings": 0, "includeDecayScore": 0, "includeCorrelations": 0, "excludeDecayed": 0} Response (if any):

504 Gateway Time-out

504 Gateway Time-out


nginx/1.18.0

I tested a bunch of configuration tweaks for timeout in following configuration file but none of them are solving the 504 error.

I tried "disabling" all nginx timeout but this has no effectAny nginx pros that c

keepalive_timeout 0; # Set to 0 for no keepalive timeout

    #fastcgi_read_timeout 0s; # Set to 0s for no FastCGI read timeout

    #proxy_read_timeout 900s;
    #proxy_connect_timeout 900s;
    #proxy_send_timeout 900s;
    #uwsgi_read_timeout 900s;

    #fastcgi_connect_timeout 900s;
    #fastcgi_read_timeout 900s;
    #fastcgi_send_timeout 900s;
    keepalive_timeout 1d;
    send_timeout 1d;
    client_body_timeout 1d;
    client_header_timeout 1d;
    proxy_connect_timeout 1d;
    proxy_read_timeout 1d;
    proxy_send_timeout 1d;
    fastcgi_connect_timeout 1d;
    fastcgi_read_timeout 1d;
    fastcgi_send_timeout 1d;
    memcached_connect_timeout 1d;
    memcached_read_timeout 1d;
    memcached_send_timeout 1d;
ostefano commented 5 months ago

The issue might be in the maximum execution time of the php scripts (try checking those configuration files).

How do you pull events? Might want to paginate instead of reducing timeouts.

update: I would also check in the gitter/matrix channels whether other folks have been having the same issue.

HugeekMcGill commented 5 months ago

Hi,

thanks for the reply. Here some details.

I changed some php.ini parameters for better performance as recommented by MISP. (I'll make a PR once things are tested and working)

/entrypoint_fpm.sh

MEMORY_LIMIT="${MEMORY_LIMIT:-16384M}"
MAX_EXECUTION_TIME="${MAX_EXECUTION_TIME:-300}"
UPLOAD_MAX_FILESIZE="${UPLOAD_MAX_FILESIZE:-512M}"
POST_MAX_SIZE="${POST_MAX_SIZE:-512M}"

# Default value for REDIS_FQDN if not set externally
REDIS_FQDN="${REDIS_FQDN:-redis}"

term_proc() {
    echo "Entrypoint FPM caught SIGTERM signal!"
    echo "Killing process $master_pid"
    kill -TERM "$master_pid" 2>/dev/null
}

trap term_proc SIGTERM

change_php_vars() {
    for FILE in /etc/php/*/fpm/php.ini
    do
        [[ -e "$FILE" ]] || break
        sed -i "s/memory_limit = .*/memory_limit = $MEMORY_LIMIT/" "$FILE"
        sed -i "s/max_execution_time = .*/max_execution_time = $MAX_EXECUTION_TIME/" "$FILE"
        sed -i "s/upload_max_filesize = .*/upload_max_filesize = $UPLOAD_MAX_FILESIZE/" "$FILE"
        sed -i "s/post_max_size = .*/post_max_size = $POST_MAX_SIZE/" "$FILE"
        sed -i "s/session.save_handler = .*/session.save_handler = redis/" "$FILE"
        sed -i "s|.*session.save_path = .*|session.save_path = '$(echo "$REDIS_FQDN" | grep -E '^\w+://' || echo tcp://"$REDIS>
    done
}

echo "Configure PHP | Change PHP values ..." && change_php_vars

echo "Configure PHP | Starting PHP FPM"
/usr/sbin/php-fpm7.4 -R -F & master_pid=$!`

I played with the max_execution with no effect. /etc/php/7.4/fpm/pool.d/www.conf

pm = ondemand
pm.max_children = 75
pm.process_idle_timeout = 900s;
php_flag[display_errors] = off
php_admin_value[error_log] = /var/log/nginx/fpm-php.www.log
php_admin_flag[log_errors] = on

Added timeout in /etc/nginx/nginx.conf

keepalive_timeout 0; # Set to 0 for no keepalive timeout
fastcgi_read_timeout 0s; # Set to 0s for no FastCGI read timeout

I added timeouts in /etc/nginx/includes/misp

# define the root dir
root /var/www/MISP/app/webroot;
index index.php;

# incrase the maximum body size
client_max_body_size 512M;

# added headers for hardening browser security
add_header Referrer-Policy "no-referrer" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Download-Options "noopen" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Permitted-Cross-Domain-Policies "none" always;
add_header X-Robots-Tag "none" always;
add_header X-XSS-Protection "1; mode=block" always;

# remove X-Powered-By, which is an information leak
fastcgi_hide_header X-Powered-By;

location / {
    try_files $uri $uri/ /index.php$is_args$query_string;
}

location ~ ^/[^/]+\.php(/|$) {
    include snippets/fastcgi-php.conf;
    fastcgi_pass unix:/var/run/php/php7.4-fpm.sock;
    # fastcgi_read_timeout 300;
    fastcgi_read_timeout 900;
    fastcgi_send_timeout 900;  # Add this line for send timeout
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    set $path_info $fastcgi_path_info;
    fastcgi_param PATH_INFO $path_info;
}
HugeekMcGill commented 5 months ago

I'm pulling MISP event based on tag. This is can be resource intensive i agreed and it was working prior to using nginx web server.

Yes pagination can be an alternative and is planned in the next phase of optimization.

But I'm convinced nginx expert can help out on fixing it at the source.

ostefano commented 5 months ago

I would start checking the NGINX logs to be honest. Need to understand what timeouts at this point.

There are a bunch of more options to be set for php, try request_terminate_timeout = 300 inside www.conf.

Also, how are you testing after each change? Rebuilding the image? If not you need to reload nginx(nginx -s reload)

HugeekMcGill commented 5 months ago

Hi,

Prior to posting i did that with no effects. I'll post the configuration soon.

I'm comparing php.ini from the original MISP github using apache and the this project to see if there is a difference.

ostefano commented 4 months ago

@HugeekMcGill can we close this issue or did you find some additional guidance / settings that would warrant an updated README?

HugeekBo commented 4 months ago

Hi,

Sorry I was caught on intense testing.

Yes the problem was resolved and I'll make a PR later this week to add some tweaks.

I comparer some files from the original project and I'll share them in the PR.

I'll gladly share back, I just need couples of days.

grumo35 commented 4 months ago

@HugeekMcGill can we close this issue or did you find some additional guidance / settings that would warrant an updated README?

I might encunter the same issue depsite everything being updated and healthy i keep getting 504's timed out just after logging in MISP ( pymisp or web ui ) I'm digging into timeouts and php config as i speak but no obvious errors in logs.

HugeekMcGill commented 4 months ago

ok, then I'll bootstrap by posting the changes that a made here and do the PR later this week. Cause it took me quite some time to found the right combination of timeout to set.

RUN sed -i 's|^\(command\s*=\s*\)/entrypoint_fpm.sh|\1/entrypoint_fpm.new.sh|' /etc/supervisor/conf.d/10-supervisor.conf


# Default values for environment variables
MEMORY_LIMIT=16384M
MAX_EXECUTION_TIME=3000
UPLOAD_MAX_FILESIZE=1024M
POST_MAX_SIZE=1024M

# Default value for REDIS_FQDN if not set externally
REDIS_FQDN="${REDIS_FQDN:-redis}"

term_proc() {
    echo "Entrypoint FPM caught SIGTERM signal!"
    echo "Killing process $master_pid"
    kill -TERM "$master_pid" 2>/dev/null
}

trap term_proc SIGTERM

change_php_vars() {
    for FILE in /etc/php/*/fpm/php.ini
    do
        [[ -e "$FILE" ]] || break
        sed -i "s/memory_limit = .*/memory_limit = $MEMORY_LIMIT/" "$FILE"
        sed -i "s/max_execution_time = .*/max_execution_time = $MAX_EXECUTION_TIME/" "$FILE"
        sed -i "s/upload_max_filesize = .*/upload_max_filesize = $UPLOAD_MAX_FILESIZE/" "$FILE"
        sed -i "s/post_max_size = .*/post_max_size = $POST_MAX_SIZE/" "$FILE"
        sed -i "s/session.save_handler = .*/session.save_handler = redis/" "$FILE"
        sed -i "s|.*session.save_path = .*|session.save_path = '$(echo "$REDIS_FQDN" | grep -E '^\w+://' || echo tcp://"$REDIS_FQDN"):6379'|" "$FILE"
    done
}

echo "Configure PHP | Change PHP values ..." && change_php_vars

echo "Configure PHP | Starting PHP FPM"
/usr/sbin/php-fpm7.4 -R -F & master_pid=$!

# Wait for it
wait "$master_pid"
root /var/www/MISP/app/webroot;
index index.php;

# incrase the maximum body size
client_max_body_size 512M;

# added headers for hardening browser security
add_header Referrer-Policy "no-referrer" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Download-Options "noopen" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Permitted-Cross-Domain-Policies "none" always;
add_header X-Robots-Tag "none" always;
add_header X-XSS-Protection "1; mode=block" always;

# remove X-Powered-By, which is an information leak
fastcgi_hide_header X-Powered-By;

location / {
    try_files $uri $uri/ /index.php$is_args$query_string;
    **fastcgi_read_timeout 3000;  # Add this for feedngen timeout on big request
    fastcgi_send_timeout 3000;  # Add this for feedngen timeout on big request**
}

location ~ ^/[^/]+\.php(/|$) {
    include snippets/fastcgi-php.conf;
    fastcgi_pass unix:/var/run/php/php7.4-fpm.sock;
    **fastcgi_read_timeout 3000;
    fastcgi_send_timeout 3000; **
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    set $path_info $fastcgi_path_info;
    fastcgi_param PATH_INFO $path_info;
}
server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;

    # disable access logs
    access_log on;
    log_not_found on;
    #error_log  /dev/stderr error;
    error_log /var/log/nginx/misp443.logs error;

    # ssl options
    ssl_certificate /etc/nginx/certs/cert.pem;
    ssl_certificate_key /etc/nginx/certs/key.pem;
    ssl_session_timeout 1d;
    ssl_session_cache shared:MozSSL:10m;  # about 40000 sessions
    ssl_session_tickets off;

    # ssl intermediate configuration
    ssl_dhparam /etc/nginx/certs/dhparams.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
    ssl_prefer_server_ciphers off;

    # ssl enable HSTS
    add_header Strict-Transport-Security "max-age=15768000; includeSubdomains";
    add_header X-Frame-Options SAMEORIGIN;

    # include misp
    include includes/misp;

    fastcgi_read_timeout 3000;
    fastcgi_send_timeout 3000;
    fastcgi_connect_timeout 3000;
}
RUN sed -i 's|;*max_input_time\s*=.*|max_input_time = 3000|' /etc/php/7.4/fpm/php.ini
RUN sed -i 's|;*max_execution_time\s*=.*|max_execution_time = 3000|' /etc/php/7.4/fpm/php.ini
grumo35 commented 4 months ago

Hey, thanks for your fast reply. I managed to log in after applying some of your tweaks but i cannot figure out the severals minutes timeout on a powerfull machine. Do you have any clues about what is causing this performance issue ?

HugeekBo commented 4 months ago

At that point we will need logs.

In the configuration I posted earlier, you enable some logs.

Dig the errors and post them here, and we can see.

I remember that in misp configuration ui you can the change another timeout for curl.

What are you pulling from misp with pymisp ?

grumo35 commented 4 months ago

It's not even when pulling, the platform respond well while i'm logged in but when i connect i have to wait more between 3 to 5 minutes + before seeing the UI, pymisp T/O aswell.

I'll try to modify container with nginx debug and more verbosity on php.

Thanks for your help.

HugeekMcGill commented 4 months ago

Ahh that would this.


RUN sed -i "s/^pm\.max_children = .*/pm.max_children = 75/" /etc/php/7.4/fpm/pool.d/www.conf```

Restart the the nginx and fpm-php services

That will help
grumo35 commented 4 months ago

I figured out i wasnt using CI/CD ;)

I found out i have intense disk activity when logging in, is maria db and my virtual setup at fault ?

I have a NFS as storage for my cluster of hypervisor and high performance SSD on a 10G network, my MISP setup is ubuntu vm and docker, direct attached disks.

Never had felt any issue while running severals multi terabytes elasticsearch cluster with intensive usage.

Do you have a similar diagnostic on disks usage ?

HugeekMcGill commented 4 months ago

I do see intense activity on disk but it based on the fact that I do multi thread pymisp queries to extract iocs from multiple event in order to build custom feeds

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: grumo35 @.> Sent: Tuesday, June 4, 2024 11:23:40 AM To: MISP/misp-docker @.> Cc: Hugo Beaucage @.>; Mention @.> Subject: Re: [MISP/misp-docker] 504 Gateway Time-out while querying MISP with pymisp (Issue #59)

I figured out i wasnt using CI/CD ;)

I found out i have intense disk activity when logging in, is maria db and my virtual setup at fault ?

I have a NFS as storage for my cluster of hypervisor and high performance SSD on a 10G network, my MISP setup is ubuntu vm and docker, direct attached disks.

Never had felt any issue while running severals multi terabytes elasticsearch cluster with intensive usage.

Do you have a similar diagnostic on disks usage ?

— Reply to this email directly, view it on GitHubhttps://github.com/MISP/misp-docker/issues/59#issuecomment-2147812350, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARGSGERCNANXKLHUPA3VJGTZFXLXZAVCNFSM6AAAAABHYLLILCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXHAYTEMZVGA. You are receiving this because you were mentioned.Message ID: @.***>

AmitKulkarni9 commented 4 months ago

facing the same issue. If its fixed can we please merge this :) @HugeekBo

HugeekMcGill commented 4 months ago

Sorry got delayed, yes I'll make the PR soon, this week, for real this time.

What is your issue exactly ? Are you making big API request with pyMISP as well ?

AmitKulkarni9 commented 4 months ago

I am getting 504 gateway timeout consistently through WEB UI when clicked on List Events

AmitKulkarni9 commented 3 months ago

Any update, I am getting 504 through pymisp too. pymisp.exceptions.MISPServerError: Error code 500:

504 Gateway Time-out

504 Gateway Time-out


nginx/1.18.0
HugeekMcGill commented 3 months ago

Yep working on the PR, I was in vacation.

ostefano commented 2 months ago

Left a few comments; once addressed we can merge 👍

ostefano commented 2 months ago

@HugeekMcGill if you have some spare cycles, please review my comments. I would like to merge this.

AmitKulkarni9 commented 2 months ago

@HugeekBo @HugeekMcGill can this be merged please :)

HugeekMcGill commented 2 months ago

Yep, it will be done today.

HugeekMcGill commented 1 month ago

Capacity testing in progress

One of the suggested reviewed parameters is still cause bad gateway error so troubleshooting it to be sure it's the root cause.

ostefano commented 1 month ago

I have re-worked many of the changes here https://github.com/MISP/misp-docker/actions/runs/10538630557

Please test and in case open a new issue and PR.