Toparvion / analog

🔎 Flexible web-based real-time log viewer
MIT License
19 stars 5 forks source link

[Enhancement] Secure URL feature #20

Open gamefundas opened 5 years ago

gamefundas commented 5 years ago

I noticed the following warning that reminds about the security exposure around allowing access for any log file using URL feature https://github.com/Toparvion/analog#heavy_exclamation_mark-security-caution

I think this can be very concerning for many who would use this tool. A simple configuration option in case of URL feature could be that if the application has a property like "url_log_base=/apps/logs" then everything on the URL can be in relative context to the base directory. This will restrict people from seeing anything outside of the defined scope. URL hacks like "../../" may need to be thought through but this is still better than what we have currently.

If this needs to be more flexible then it can support a list of such base directories. Otherwise in my case the info sec guidelines wont allow me to deploy this tool. Also think Java permissions can be leveraged but then all of those would require more configurations to deal with.

Toparvion commented 5 years ago

@gamefundas Sure, this is very serious concern. I see two opposite kinds of solution.

  1. Just like you've proposed, there can be a single global configuration key that would restrict access to the specified directory only. It is simple to implement but in some installations (where there are logs in many different locations) it can inflexible or even impossible to use as the same AnaLog instance should have access to absolutely different directories.

  2. In contrast, we could have a dedicated option for each log choice declared in configuration file. The option can play the same role but for certain log choice only. This would be extremely flexible but (a) significantly harder to implement and (b) can make choices configuration complicated or even confusing.

I tend to something in between these two solutions and make the restriction an attribute of choice group. Usually plain logs of a group belong to the same root directory (as far as I see in current installations at least). That is why AnaLog already has a special parameter ´plainLogsBaseDir´ which helps to avoid specifying a base dir in every log path. Perhaps we can reuse this parameter as a access restriction. Though it doesn't mean that there is no need to have a global restriction parameter anymore.

What do you think about these solutions?

gamefundas commented 5 years ago

I guess the flexibility around having global and local base directory makes sense. Currently all our logs are centralized in to a cloud file system. So having that global one will secure things in our use case.

Toparvion commented 5 years ago

Ok, I will start with implementing the basic restriction which would deny access to any directory outside of specified one.

Toparvion commented 5 years ago

I've researched the subject and come to the following considerations:

Admittedly, the aforementioned considerations can be satisfied by adding the following single parameter into application.yaml file:

allowedPaths:
  - /pub/home/**
  - /**/log?/**
  - /**/*.log

In other words, the restriction can be expressed with YAML array of GLOB patterns, where all elements are considered in OR combination. The array is not required; by default it would have the only element pointing to logs in current user's home directory, i.e. ${user.home}/**.log.

I'm going to implement the feature the way I've described above. @gamefundas, if you have any comments I would be glad to discuss.

gamefundas commented 5 years ago

This is well thought through and should work. The user home default is nice idea that way nobody is exposed to begin with.

In case if we have a /pub/home/ then the url access will only allow url’s that match http://host:port/pub/home/some.log?

Toparvion commented 5 years ago

@gamefundas Yes. Note that the same path can be specified for remote node (I.e. node://pub/home...) but AnaLog agent will restrict it as well.

gamefundas commented 5 years ago

Actually I wasn't aware of the remote feature, this sounds new. Its already released or work in progress. Couldn't find any documentation on the agent.

gamefundas commented 5 years ago

I see it now https://github.com/Toparvion/analog/wiki/application.yaml and the wiki pages have few examples as well. Missed seeing it for some reason.

Would like to understand this architecture better, are we saying that Analog can be configured to run as a server and applications/nodes can run Analog agent to expose their logs.

The only trouble I see with this approach is maintaining application.yaml. As applications run on ephemeral infrastructure defining or managing a static configuration of nodes is going to be a daunting task. Formulating a URL scheme that translates the path to actual host/target can help move away from the static definition.

An example for remote log fetch would be something like http://analog-server///logpath/

target - localhost, hostname:port, ipv4 address (localhost will serve logs from current host or server) - Default agent port 7801 if not specified protocol - node, composite, docker, k8, etc

Support for something like this, may be in addition to cluster configuration (which makes sense for complex and detailed setup) can help source remote logs without any declarative configurations. Basically this will allow sourcing logs from any Analog agent without any configuration. Just a high level thought.

Toparvion commented 5 years ago

Its already released or work in progress

This is a released feature. It was initially implemented in v0.7 as a part of composite log support. Then, in v0.11 it was exposed as a standalone feature so that from this version on you can refer to logs residing on remote servers by specifying node://<node-name>/<full-path> .

Couldn't find any documentation on the agent. ... I see it now

You can also look at choices.yaml sample file to find examples of remote logs addresses, e.g.:

- path: node://backupNode/home/upc/node-app.log    # 'backupNode' is the name of node as declared in 'nodes' config section

Would like to understand this architecture better, are we saying that Analog can be configured to run as a server and applications/nodes can run Analog agent to expose their logs.

TL;DR - Yes, that's right (with some notes). In detail, every AnaLog instance can play two 'roles' at the same time. The server role is an entry point for web clients (i.e. browsers). Servers do not actually work with logs, they just delegate the work to agents. An agent role is an entry point for servers. Agents actually work with all types of logs including containerized ones. The main idea behind such an architecture is that a server can query multiple agents thus composing multiple logs into a single so called composite log, while agents know nothing about composite logs, they just handle single watching processes for various types of logs. In fact, AnaLog doesn't need to be configured neither as a server not as an agent because every AnaLog instance can play both roles simultaneously 'out of the box'. Furthermore, even you use single AnaLog instance you actually make the AnaLog server side work with the agent side that deployed in the same instance. The only configuration you might need to specify is a list of nodes i.e. names and addresses of all the AnaLog instances which can be agents for current server.

The only trouble I see with this approach is maintaining application.yaml An example for remote log fetch would be something like...

Yes, that's the problem. The solution you propose looks good but I doubt about its portability because inclusion of actual nodes addresses into AnaLog pseudo URLs (log paths) would make those URLs un-reusable i.e. every time a node changes its address all the URLs become invalid. It may not only disturb end users with periodical 404 error (or something like this) but also would make logs paths unsuitable for storing in browsers bookmarks (which is a quite handy feature now).

That is why I'd like to propose another solution (which is actually already drafted in current versions of AnaLog). Instead of declaring explicit addresses anywhere (be it an URL or application.yaml) we can rely on classic service discovery integration pattern i.e. make the nodes register themselves in a central service registry (like Eureka, Consul, etc) and then refer to the nodes not by their address but by their logical name. In fact what I've implemented now in AnaLog is a first step towards this approach - we already have a notion of node name and use it. The second step should be a change of binding for that name - instead of static address declaration it should be bound to a record in the service registry. Of course it makes AnaLog deployment more complicated in general, but:

@gamefundas, what do you think about such an approach?

gamefundas commented 5 years ago

I am thrilled hearing service discovery as originally my thinking was around it but then at the same time restrained myself as it could complicate Analog by integrating in to other products. We use Consul extensively for service discovery so this could be a blessing in disguise.

However please note that, this may not work in cases where processes are independent and not discover-able in nature. Still your idea would address a high percentage of use cases and seems to be the right direction.

The one other option which I can think of, which can avoid deeper integration with other tools is, for the Analog agent to register to the server with a list of log remote locations that's available to browse. Users have full control over their agents and its configuration so essentially they will always configure it in a way that seems fit. The server is simply using the registered information to create url's/path to source the remote file. This will also help populate the drop down with remote locations. Not sure if this is exactly what you have now and I misunderstood the example.

Toparvion commented 5 years ago

this may not work in cases where processes are independent and not discover-able in nature

Yes, I totally agree with you and that is why I've mentioned early the ability to specify nodes configuration statically. This would serve as a fallback for any environment with no service discovery installed or for those who use AnaLog in a simple deployment scenario. Also it can appear not so easy to support multiple service discovery implementations at the same time but I hope Spring Cloud would help with it.

Speaking of your proposal I can say it is quite similar to what we have now. The main difference is that currently agents do not tell the servers about their available locations. Instead, server populates the drop down with data from it its own static configuration (choices.yaml) and then queries logs from the agents. This means that an agent may refuse to watch the requested log, e.g. because there is no such log on the agent. It seems that your proposal can avoid such situation. But on the other hand we should remember that AnaLog servers and agents may have different life cycles thus leading to possible mismatching of what they know about each other at any time. Because of that, we cannot guarantee that a list of locations returned by an agent will remain correct for any future time extent. So I think it's better to stay current way and utilize logical node names as much as possible.

gamefundas commented 5 years ago

this may not work in cases where processes are independent and not discover-able in nature

Yes, I totally agree with you and that is why I've mentioned early the ability to specify nodes configuration statically. This would serve as a fallback for any environment with no service discovery installed or for those who use AnaLog in a simple deployment scenario. Also it can appear not so easy to support multiple service discovery implementations at the same time but I hope Spring Cloud would help with it.

Speaking of your proposal I can say it is quite similar to what we have now. The main difference is that currently agents do not tell the servers about their available locations. Instead, server populates the drop down with data from it its own static configuration (choices.yaml) and then queries logs from the agents. This means that an agent may refuse to watch the requested log, e.g. because there is no such log on the agent. It seems that your proposal can avoid such situation. But on the other hand we should remember that AnaLog servers and agents may have different life cycles thus leading to possible mismatching of what they know about each other at any time. Because of that, we cannot guarantee that a list of locations returned by an agent will remain correct for any future time extent. So I think it's better to stay current way and utilize logical node names as much as possible.

I feel strongly that the agents defining the log locations would guide the design better as agents by all means know their logs better and are the ones that change state often. Whenever applications deploy they would restart their agents and new configurations will come to effect. The server will not have to be touched or reconfigured in any way. It can be configured only for its own logs.

I have 90 applications operating in my VPC which typically register to central infrastructure such as metrics, discovery, syslog monitoring etc and these apps are constantly deploying/redeploying and the numbers continue to shrink and grow depending on autoscaling loads etc. In such a dynamic environment with responsibilities divided between various team, I don't have to ever worry about whats on the central log viewer (Analog) as log locations are subscribed by Analog server whenever the agent connects. Else I will be constantly in the process of tweaking adding/removing editing configurations in production and then restarting Analog to be able to support new or moved log locations.

Worth considering this feature for sure.

Toparvion commented 5 years ago

Ah, it seems I misunderstood you, sorry. Let's try to clarify it.

What I understood early (from that comment)

AnaLog should use an external service registry (like Consul) to store the information about current agents locations. Additionally, agents can register themselves to some AnaLog instance (a server) in order to populate the server's drop down list of log choices.

What I see now (from this comment)

AnaLog should not use an external service register. Instead, one of AnaLog instances would be elected as a central one ("a server") and then all other instances (agents) would register themselves to this server providing it with lists of available logs to watch. In other words, we don't refuse from having a service registry, rather we're making it an internal part of AnaLog cluster.

If this is correct notion, I'll try to describe it a little bit further.

Since we don't have to support any external service registry anymore, we're free to choose any registry that best suites our needs as internal one. Netflix Eureka seems a good choice because it's shipped as a library and thus can be easily integrated into AnaLog instance making it a full-fledged service registry for the agents (I have an experience of building such applications). At the same time Eureka protocol supports meta data which can be useful for us to propagate agents' log locations to the server. Furthermore, as Eureka clients (in every AnaLog agent) performs periodical queries (heartbeats) to Eureka server (AnaLog server), we can use it to dynamically refresh the content for drop down list on the server.

Though it should be stressed that such choice configuration model should be an additional option rather than a requirement. The reason is that there is an AnaLog deployment (at least one I know) where all 7-9 instances are controlled by single administrator. He doesn't store any log choice configuration on agents because it would make him control 7-9 files on 7-9 servers. Instead, he collected all the configuration on the (single) server and controls only it. Looks reasonable to me. In other words, there are situations where both approaches (storing choices on agents and on server) are preferable.

P.S. Just in case, I'd like to remind that every AnaLog instance can play both roles: server and agent. So that any person or a team responsible for some AnaLog agent can work with its logs without any additional server, just by opening agent's web UI.

gamefundas commented 5 years ago

Yea, I got a little excited about service discovery but then soon realized it would mean adding dependency on other frameworks which is not a good thing. But yea this can be left to users choice.

Also this depends on team to team. In our case the DevOps or infrastructure owners are a separate group who provide centralized services such as logging, security, monitoring, provisioning, database change management etc. The application owners are building business services and merely interacting or configuring their apps to use the central services. All configuration aspects (almost) are provided for the app teams to figure out how to customize it to their needs.

Currently based on cloud configuration, all application logs are streamed to a central log server (logstash running along with a primitive log ui) which makes it available for viewing. What form, location, hierarchy etc the logs should be made available in the UI is fully controlled by the log agents (Filebeat) using simple metadata. This gives full control to the application teams as they know best, of the various logs (components) they want to publish.

That said will leave this to you to decide. We are perhaps in agreement for the most part. My second part of the solution was mainly to get rid of the log shipping aspect completely (No Filebeat and Logstash in my case). As it also makes AnaLog as an end to end solution for viewing logs with zero configuration (no knowledge about client log locations) on the server side.

Toparvion commented 5 years ago

Yes, now I see your use case and seem to understand how to fully address it in AnaLog. Thank you for the description! Generally speaking, the approach to agents log configuration can be of two kinds depending on teams responsibility segregation. One kind comes from teams with dedicated DevOps or infrastructure engineers while another comes from (typically smaller) "self-contained" teams. AnaLog should be suitable for both.

The only thing I'd like to clarify a bit more is an aspect of storage. Do you currently use any persistence under Logstash (e.g. Elastic like in ELK)? As you know, AnaLog doesn't provide any storage for logs and is not meant for because AnaLog is a log viewer. But when we compare it with a persistent solution like ELK we should keep in mind this significant difference. Just a reminder.

gamefundas commented 5 years ago

Our current pipeline is

App -> Cloudwatch -> Lamdba -> Redis -> Logstash (fanout to multiple outputs) =>

-> Filesystem (EFS) -> Custom Log Viewer -> Graylog -> Elasticsearch -> Alert Engine -> Email

Yes we have persistence in Amazon EFS (central file system) and Elasticsearch The custom log viewer is very basic with ability to view/download files. We are looking to replace it with Analog to be able to tail logs in real time as its something the developers have been wanting for.

Toparvion commented 5 years ago

Ok, thank you for clarifying. Now I'm working on AnaLog v0.12 which will include a solution for the current issue as well as English UI (basic implementation) and a couple of minor fixes (in UI mostly). I'll notify you when ready (or contact early in case of any questions).

gamefundas commented 5 years ago

Awesome thanks for the update. Look forward to the release.

Toparvion commented 5 years ago

I've drafted a document with description of the feature. @gamefundas, can you please review it? Feel free to leave any kind of feedback: from comments on grammar mistakes to criticism about philosophical ambiguities.

Toparvion commented 5 years ago

Log Access Control

Overview

AnaLog can restrict access to certain logs to prevent sensitive data from leaking (e.g. queries like GET http://analog-host/etc/passwd).

:information_source: As of v0.12 the restriction is provided for file logs only (both remote and local ones).

To manage access restrictions the administrator can use allowed-log-locations top-level configuration section of application.yaml file. The file subsection can be used to specify allowed file log locations (and exceptions from them) in the form of Glob path patterns.

AnaLog ships with the following default access configuration:

allowed-log-locations:
  file:
    include:
      - ${user.home}/**/*.log
      - ${user.dir}/**.log
    exclude:
      - ${user.home}/**/.**
    symlink-resolution-limit: 1

It allows reading of non-hidden logs from current user’s home directory tree as well as reading any logs from AnaLog’s own working directory tree.

If you want to understand the access restriction mechanism in depth, please read the following section.

Details

AnaLog performs access check in two cases:

  1. On every start of a new file log tailing process.

    It means that if a client just subscribes to an existing tailing process, no access check is performed.

  2. On downloading of a file log.

The checks are performed on the current instance’s agent side, i.e. every AnaLog instance controls its own logs only and doesn’t restrict others in any way. As a result, there is currently no way for the administrator to declare all the access configuration on single instance (“master”) and make it come to effect everywhere.

The access control procedure consists of the following steps.

1. Checking if file log access is denied at all

AnaLog uses restrictive access model, i.e. everything that is not explicitly allowed, is denied. Therefore there must be at least one entry in allowed-log-locations.file.include list to read any file log. Otherwise AnaLog totally denies file log access and immediately returns the following message to the client:

No allowed file log locations specified. See 'allowed-log-locations' property.

This can be useful for AnaLog instances that are not intended for working with file logs (e.g. if they are meant to work with container logs only).

2. Normalizing the path

On this step AnaLog:

  1. converts all the slashes to the current OS format

  2. turns the path to an absolute one without any relative parts (i.e. gets rid of parts like ../some-dir/..)

  3. resolves symbolic links as many times as allowed-log-locations.file.symlink-resolution-limitproperty specifies. In case of violation AnaLog prevents log from reading and returns the following message to client:

    Symbolic links resolution limit (<limit-value>) has been reached.

    If the property equals 0, symlink resolution is denied at all (logs cannot be read through links) and the client receives the following message:

    Symbolic links to logs are not allowed.

These three steps provide subsequent checking with actual log paths rather than various pointers like relative paths, symlinks, etc.

3. Checking the path against including patterns

On this step AnaLog checks normalized path against allowed-log-locations.file.include config section which contains a list of Glob patterns.

:information_source: Simply put, Glob patterns lets you specify a path either as usual e.g. /home/me/app.log or in a more generic way e.g. /home/me/*.log. The last means “give me all files that reside in /home/me directory and whose names end with .log”

The requested log path must match at least one of the patterns to pass the check. AnaLog processes the list from top to bottom, the first matching pattern wins, no other patterns are examined in this case. If no patterns are specified, an error is returned immediately (see first check above).

When no matching patterns are found, the log is denied from reading and the following message is returned to the client:

Access denied: log path '<path>' is not included into 'allowed-log-locations' property.

Note that matching to any of including patterns is not sufficient for a log to be allowed for reading because AnaLog also checks every log path against excluding patterns (see next).

4. Checking the path against excluding patterns

On this step AnaLog checks given path against allowed-log-locations.file.exclude config section which also contains a list of Glob patterns.

The requested log path must not match any of the patterns to be allowed for reading. AnaLog processes the list from top to bottom, the first matching pattern wins, no other patterns are examined in this case. If no excluding patterns are specified, the path is considered allowed for reading.

When matching pattern is found, the log is denied for reading and the following message is returned to the client:

Access denied: log path '<path>' is excluded from 'allowed-log-locations' property.

Excluding patterns are meant to somehow narrow the including patterns. They also can serve as exceptions from the implicit list of logs matching the including patterns. To understand it better, you can formulate the access control logic as the following sentence:

I want my logs to be allowed for reading from (<inclusion1> or <inclusion2> or …) but not from (<exclusion1> and <exclusion2> and …)

Note that including patterns (<inclusion>) are combined with or while excluding patterns (<exclusion>) are combined with and.


Variables substitution

You can refer to various external values from within access settings with the help of ${...} notation. The values can originate from:

For example:

allowed-log-locations:
  file:
    include:
      - ${user.home}/**.log     # JVM built-in property (current OS user home dir)
      - ${JAVA_HOME}/lib/classlist  # OS environmental variable
      - **/${nodes.this.name}       # locally defined parameter (AnaLog's current node name)
    exclude:
      - **/${user.name}.log     # JVM built-in property (current OS user name)
Toparvion commented 5 years ago

During the testing its appeared that some tail implementations (at least GNU Core Utils which is enough) do follow symbolic links unconditionally. It can be used to bypass the checking AnaLog performs at the start of a new tailing process. To address the issue I've added additional handling for certain tail events and now AnaLog is capable of detecting path manipulations at watching runtime as well. It took some time to implement.

Toparvion commented 5 years ago

By now, the feature seems completed and I'm going to close the issue soon.

@gamefundas, I still would like you to review the description of the feature before I make it generally available. I've updated this document and placed it to Wiki for your convenience.

gamefundas commented 5 years ago

Apologies, was away a little bit on a new engagement and couldn't attend to our Dev Ops track. I read through your notes on the overall design around folders and its very comprehensive. Love the flexibility overall. I will try this in our new log pipeline stack to see how it operates in a real world scenario. Planning for a release soon?

Toparvion commented 5 years ago

@gamefundas Yes. Currently I'm working on translation (#22) which doesn't seem a big deal. After that the release will be published. I will notify you when ready. Thank you very much for your time and feedback!